To index or not to index: Time-space trade-offs in search engines with positional ranking functions

Diego Arroyuelo, Senén González, Mauricio Marin, Mauricio Oyarzún, Torsten Suel

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Positional ranking functions, widely used in Web search engines, improve result quality by exploiting the positions of the query terms within documents. However, it is well known that positional indexes demand large amounts of extra space, typically about three times the space of a basic nonpositional index. Textual data, on the other hand, is needed to produce text snippets. In this paper, we study time-space trade-offs for search engines with positional ranking functions and text snippet generation. We consider both index-based and non-index based alternatives for positional data. We aim to answer the question of whether one should index positional data or not. We show that there is a wide range of practical time-space trade-offs. Moreover, we show that both position and textual data can be stored using about 71% of the space used by traditional positional indexes, with a minor increase in query time. This yields considerable space savings and outperforms, both in space and time, recent alternatives from the literature. We also propose several efficient compressed text representations for snippet generation, which are able to use about half of the space of current state-of-the-art alternatives with little impact in query processing time.

    Original languageEnglish (US)
    Title of host publicationSIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval
    Pages255-264
    Number of pages10
    DOIs
    StatePublished - 2012
    Event35th Annual ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012 - Portland, OR, United States
    Duration: Aug 12 2012Aug 16 2012

    Other

    Other35th Annual ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012
    CountryUnited States
    CityPortland, OR
    Period8/12/128/16/12

    Fingerprint

    Search engines
    Time and motion study
    Query processing

    Keywords

    • positional indexing
    • text compression for snippet generation

    ASJC Scopus subject areas

    • Information Systems

    Cite this

    Arroyuelo, D., González, S., Marin, M., Oyarzún, M., & Suel, T. (2012). To index or not to index: Time-space trade-offs in search engines with positional ranking functions. In SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 255-264) https://doi.org/10.1145/2348283.2348320

    To index or not to index : Time-space trade-offs in search engines with positional ranking functions. / Arroyuelo, Diego; González, Senén; Marin, Mauricio; Oyarzún, Mauricio; Suel, Torsten.

    SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. 2012. p. 255-264.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Arroyuelo, D, González, S, Marin, M, Oyarzún, M & Suel, T 2012, To index or not to index: Time-space trade-offs in search engines with positional ranking functions. in SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 255-264, 35th Annual ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012, Portland, OR, United States, 8/12/12. https://doi.org/10.1145/2348283.2348320
    Arroyuelo D, González S, Marin M, Oyarzún M, Suel T. To index or not to index: Time-space trade-offs in search engines with positional ranking functions. In SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. 2012. p. 255-264 https://doi.org/10.1145/2348283.2348320
    Arroyuelo, Diego ; González, Senén ; Marin, Mauricio ; Oyarzún, Mauricio ; Suel, Torsten. / To index or not to index : Time-space trade-offs in search engines with positional ranking functions. SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. 2012. pp. 255-264
    @inproceedings{f6d77318fa3040f790daba8a96232035,
    title = "To index or not to index: Time-space trade-offs in search engines with positional ranking functions",
    abstract = "Positional ranking functions, widely used in Web search engines, improve result quality by exploiting the positions of the query terms within documents. However, it is well known that positional indexes demand large amounts of extra space, typically about three times the space of a basic nonpositional index. Textual data, on the other hand, is needed to produce text snippets. In this paper, we study time-space trade-offs for search engines with positional ranking functions and text snippet generation. We consider both index-based and non-index based alternatives for positional data. We aim to answer the question of whether one should index positional data or not. We show that there is a wide range of practical time-space trade-offs. Moreover, we show that both position and textual data can be stored using about 71{\%} of the space used by traditional positional indexes, with a minor increase in query time. This yields considerable space savings and outperforms, both in space and time, recent alternatives from the literature. We also propose several efficient compressed text representations for snippet generation, which are able to use about half of the space of current state-of-the-art alternatives with little impact in query processing time.",
    keywords = "positional indexing, text compression for snippet generation",
    author = "Diego Arroyuelo and Sen{\'e}n Gonz{\'a}lez and Mauricio Marin and Mauricio Oyarz{\'u}n and Torsten Suel",
    year = "2012",
    doi = "10.1145/2348283.2348320",
    language = "English (US)",
    isbn = "9781450316583",
    pages = "255--264",
    booktitle = "SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval",

    }

    TY - GEN

    T1 - To index or not to index

    T2 - Time-space trade-offs in search engines with positional ranking functions

    AU - Arroyuelo, Diego

    AU - González, Senén

    AU - Marin, Mauricio

    AU - Oyarzún, Mauricio

    AU - Suel, Torsten

    PY - 2012

    Y1 - 2012

    N2 - Positional ranking functions, widely used in Web search engines, improve result quality by exploiting the positions of the query terms within documents. However, it is well known that positional indexes demand large amounts of extra space, typically about three times the space of a basic nonpositional index. Textual data, on the other hand, is needed to produce text snippets. In this paper, we study time-space trade-offs for search engines with positional ranking functions and text snippet generation. We consider both index-based and non-index based alternatives for positional data. We aim to answer the question of whether one should index positional data or not. We show that there is a wide range of practical time-space trade-offs. Moreover, we show that both position and textual data can be stored using about 71% of the space used by traditional positional indexes, with a minor increase in query time. This yields considerable space savings and outperforms, both in space and time, recent alternatives from the literature. We also propose several efficient compressed text representations for snippet generation, which are able to use about half of the space of current state-of-the-art alternatives with little impact in query processing time.

    AB - Positional ranking functions, widely used in Web search engines, improve result quality by exploiting the positions of the query terms within documents. However, it is well known that positional indexes demand large amounts of extra space, typically about three times the space of a basic nonpositional index. Textual data, on the other hand, is needed to produce text snippets. In this paper, we study time-space trade-offs for search engines with positional ranking functions and text snippet generation. We consider both index-based and non-index based alternatives for positional data. We aim to answer the question of whether one should index positional data or not. We show that there is a wide range of practical time-space trade-offs. Moreover, we show that both position and textual data can be stored using about 71% of the space used by traditional positional indexes, with a minor increase in query time. This yields considerable space savings and outperforms, both in space and time, recent alternatives from the literature. We also propose several efficient compressed text representations for snippet generation, which are able to use about half of the space of current state-of-the-art alternatives with little impact in query processing time.

    KW - positional indexing

    KW - text compression for snippet generation

    UR - http://www.scopus.com/inward/record.url?scp=84866626346&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84866626346&partnerID=8YFLogxK

    U2 - 10.1145/2348283.2348320

    DO - 10.1145/2348283.2348320

    M3 - Conference contribution

    AN - SCOPUS:84866626346

    SN - 9781450316583

    SP - 255

    EP - 264

    BT - SIGIR'12 - Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval

    ER -