Performance of compressed inverted list caching in search engines

Jiangong Zhang, Xiaohui Long, Torsten Suel

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Due to the rapid growth in the size of the web, web search engines are facing enormous performance challenges. The larger engines in particular have to be able to process tens of thousands of queries per second on tens of billions of documents, making query throughput a critical issue. To satisfy this heavy workload, search engines use a variety of performance optimizations including index compression, caching, and early termination. We focus on two techniques, inverted index compression and index caching, which play a crucial rule in web search engines as well as other high-performance information retrieval systems. We perform a comparison and evaluation of several inverted list compression algorithms, including new variants of existing algorithms that have not been studied before. We then evaluate different inverted list caching policies on large query traces, and finally study the possible performance benefits of combining compression and caching. The overall goal of this paper is to provide an updated discussion and evaluation of these two techniques, and to show how to select the best set of approaches and settings depending on parameter such as disk speed and main memory cache size.

    Original languageEnglish (US)
    Title of host publicationProceeding of the 17th International Conference on World Wide Web 2008, WWW'08
    Pages387-396
    Number of pages10
    DOIs
    StatePublished - 2008
    Event17th International Conference on World Wide Web 2008, WWW'08 - Beijing, China
    Duration: Apr 21 2008Apr 25 2008

    Other

    Other17th International Conference on World Wide Web 2008, WWW'08
    CountryChina
    CityBeijing
    Period4/21/084/25/08

    Fingerprint

    Search engines
    Cache memory
    Information retrieval systems
    World Wide Web
    Throughput
    Engines

    Keywords

    • Index caching
    • Index compression
    • Inverted index
    • Search engines

    ASJC Scopus subject areas

    • Computer Networks and Communications

    Cite this

    Zhang, J., Long, X., & Suel, T. (2008). Performance of compressed inverted list caching in search engines. In Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08 (pp. 387-396) https://doi.org/10.1145/1367497.1367550

    Performance of compressed inverted list caching in search engines. / Zhang, Jiangong; Long, Xiaohui; Suel, Torsten.

    Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08. 2008. p. 387-396.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Zhang, J, Long, X & Suel, T 2008, Performance of compressed inverted list caching in search engines. in Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08. pp. 387-396, 17th International Conference on World Wide Web 2008, WWW'08, Beijing, China, 4/21/08. https://doi.org/10.1145/1367497.1367550
    Zhang J, Long X, Suel T. Performance of compressed inverted list caching in search engines. In Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08. 2008. p. 387-396 https://doi.org/10.1145/1367497.1367550
    Zhang, Jiangong ; Long, Xiaohui ; Suel, Torsten. / Performance of compressed inverted list caching in search engines. Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08. 2008. pp. 387-396
    @inproceedings{97a9109babb945ec964414538fb8f2cf,
    title = "Performance of compressed inverted list caching in search engines",
    abstract = "Due to the rapid growth in the size of the web, web search engines are facing enormous performance challenges. The larger engines in particular have to be able to process tens of thousands of queries per second on tens of billions of documents, making query throughput a critical issue. To satisfy this heavy workload, search engines use a variety of performance optimizations including index compression, caching, and early termination. We focus on two techniques, inverted index compression and index caching, which play a crucial rule in web search engines as well as other high-performance information retrieval systems. We perform a comparison and evaluation of several inverted list compression algorithms, including new variants of existing algorithms that have not been studied before. We then evaluate different inverted list caching policies on large query traces, and finally study the possible performance benefits of combining compression and caching. The overall goal of this paper is to provide an updated discussion and evaluation of these two techniques, and to show how to select the best set of approaches and settings depending on parameter such as disk speed and main memory cache size.",
    keywords = "Index caching, Index compression, Inverted index, Search engines",
    author = "Jiangong Zhang and Xiaohui Long and Torsten Suel",
    year = "2008",
    doi = "10.1145/1367497.1367550",
    language = "English (US)",
    isbn = "9781605580852",
    pages = "387--396",
    booktitle = "Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08",

    }

    TY - GEN

    T1 - Performance of compressed inverted list caching in search engines

    AU - Zhang, Jiangong

    AU - Long, Xiaohui

    AU - Suel, Torsten

    PY - 2008

    Y1 - 2008

    N2 - Due to the rapid growth in the size of the web, web search engines are facing enormous performance challenges. The larger engines in particular have to be able to process tens of thousands of queries per second on tens of billions of documents, making query throughput a critical issue. To satisfy this heavy workload, search engines use a variety of performance optimizations including index compression, caching, and early termination. We focus on two techniques, inverted index compression and index caching, which play a crucial rule in web search engines as well as other high-performance information retrieval systems. We perform a comparison and evaluation of several inverted list compression algorithms, including new variants of existing algorithms that have not been studied before. We then evaluate different inverted list caching policies on large query traces, and finally study the possible performance benefits of combining compression and caching. The overall goal of this paper is to provide an updated discussion and evaluation of these two techniques, and to show how to select the best set of approaches and settings depending on parameter such as disk speed and main memory cache size.

    AB - Due to the rapid growth in the size of the web, web search engines are facing enormous performance challenges. The larger engines in particular have to be able to process tens of thousands of queries per second on tens of billions of documents, making query throughput a critical issue. To satisfy this heavy workload, search engines use a variety of performance optimizations including index compression, caching, and early termination. We focus on two techniques, inverted index compression and index caching, which play a crucial rule in web search engines as well as other high-performance information retrieval systems. We perform a comparison and evaluation of several inverted list compression algorithms, including new variants of existing algorithms that have not been studied before. We then evaluate different inverted list caching policies on large query traces, and finally study the possible performance benefits of combining compression and caching. The overall goal of this paper is to provide an updated discussion and evaluation of these two techniques, and to show how to select the best set of approaches and settings depending on parameter such as disk speed and main memory cache size.

    KW - Index caching

    KW - Index compression

    KW - Inverted index

    KW - Search engines

    UR - http://www.scopus.com/inward/record.url?scp=55149106898&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=55149106898&partnerID=8YFLogxK

    U2 - 10.1145/1367497.1367550

    DO - 10.1145/1367497.1367550

    M3 - Conference contribution

    SN - 9781605580852

    SP - 387

    EP - 396

    BT - Proceeding of the 17th International Conference on World Wide Web 2008, WWW'08

    ER -