Optimizing top-k document retrieval strategies for block-max indexes

Constantinos Dimopoulos, Sergey Nepomnyachiy, Torsten Suel

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Large web search engines use significant hardware and energy resources to process hundreds of millions of queries each day, and a lot of research has focused on how to improve query processing e ciency. One general class of optimizations called early termination techniques is used in all major engines, and essentially involves computing top results without an exhaustive traversal and scoring of all potentially relevant index entries. Recent work in [9, 7] proposed several early termination algorithms for disjunctive top-k query processing, based on a new augmented index structure called Block-Max Index that enables aggressive skipping in the index. In this paper, we build on this work by studying new algorithms and optimizations for Block-Max indexes that achieve significant performance gains over the work in [9, 7]. We start by implementing and comparing Block-Max oriented algorithms based on the well-known Maxscore and WAND approaches. Then we study how to build better Block-Max index structures and design better index-traversal strategies, resulting in new algorithms that achieve a factor of 2 speed-up over the best results in [9] with acceptable space overheads. We also describe and evaluate a hierarchical algorithm for a new recursive Block-Max index structure.

    Original languageEnglish (US)
    Title of host publicationWSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search and Data Mining
    Pages113-122
    Number of pages10
    DOIs
    StatePublished - 2013
    Event6th ACM International Conference on Web Search and Data Mining, WSDM 2013 - Rome, Italy
    Duration: Feb 4 2013Feb 8 2013

    Other

    Other6th ACM International Conference on Web Search and Data Mining, WSDM 2013
    CountryItaly
    CityRome
    Period2/4/132/8/13

    Fingerprint

    Query processing
    Energy resources
    Search engines
    Computer hardware
    Engines

    Keywords

    • block-max inverted index
    • docid-oriented block-max index
    • early termination
    • top-k query processing

    ASJC Scopus subject areas

    • Computer Networks and Communications
    • Computer Science Applications

    Cite this

    Dimopoulos, C., Nepomnyachiy, S., & Suel, T. (2013). Optimizing top-k document retrieval strategies for block-max indexes. In WSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search and Data Mining (pp. 113-122) https://doi.org/10.1145/2433396.2433412

    Optimizing top-k document retrieval strategies for block-max indexes. / Dimopoulos, Constantinos; Nepomnyachiy, Sergey; Suel, Torsten.

    WSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search and Data Mining. 2013. p. 113-122.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Dimopoulos, C, Nepomnyachiy, S & Suel, T 2013, Optimizing top-k document retrieval strategies for block-max indexes. in WSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search and Data Mining. pp. 113-122, 6th ACM International Conference on Web Search and Data Mining, WSDM 2013, Rome, Italy, 2/4/13. https://doi.org/10.1145/2433396.2433412
    Dimopoulos C, Nepomnyachiy S, Suel T. Optimizing top-k document retrieval strategies for block-max indexes. In WSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search and Data Mining. 2013. p. 113-122 https://doi.org/10.1145/2433396.2433412
    Dimopoulos, Constantinos ; Nepomnyachiy, Sergey ; Suel, Torsten. / Optimizing top-k document retrieval strategies for block-max indexes. WSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search and Data Mining. 2013. pp. 113-122
    @inproceedings{85dfa750270d4f58be40e414b0fcab77,
    title = "Optimizing top-k document retrieval strategies for block-max indexes",
    abstract = "Large web search engines use significant hardware and energy resources to process hundreds of millions of queries each day, and a lot of research has focused on how to improve query processing e ciency. One general class of optimizations called early termination techniques is used in all major engines, and essentially involves computing top results without an exhaustive traversal and scoring of all potentially relevant index entries. Recent work in [9, 7] proposed several early termination algorithms for disjunctive top-k query processing, based on a new augmented index structure called Block-Max Index that enables aggressive skipping in the index. In this paper, we build on this work by studying new algorithms and optimizations for Block-Max indexes that achieve significant performance gains over the work in [9, 7]. We start by implementing and comparing Block-Max oriented algorithms based on the well-known Maxscore and WAND approaches. Then we study how to build better Block-Max index structures and design better index-traversal strategies, resulting in new algorithms that achieve a factor of 2 speed-up over the best results in [9] with acceptable space overheads. We also describe and evaluate a hierarchical algorithm for a new recursive Block-Max index structure.",
    keywords = "block-max inverted index, docid-oriented block-max index, early termination, top-k query processing",
    author = "Constantinos Dimopoulos and Sergey Nepomnyachiy and Torsten Suel",
    year = "2013",
    doi = "10.1145/2433396.2433412",
    language = "English (US)",
    isbn = "9781450318693",
    pages = "113--122",
    booktitle = "WSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search and Data Mining",

    }

    TY - GEN

    T1 - Optimizing top-k document retrieval strategies for block-max indexes

    AU - Dimopoulos, Constantinos

    AU - Nepomnyachiy, Sergey

    AU - Suel, Torsten

    PY - 2013

    Y1 - 2013

    N2 - Large web search engines use significant hardware and energy resources to process hundreds of millions of queries each day, and a lot of research has focused on how to improve query processing e ciency. One general class of optimizations called early termination techniques is used in all major engines, and essentially involves computing top results without an exhaustive traversal and scoring of all potentially relevant index entries. Recent work in [9, 7] proposed several early termination algorithms for disjunctive top-k query processing, based on a new augmented index structure called Block-Max Index that enables aggressive skipping in the index. In this paper, we build on this work by studying new algorithms and optimizations for Block-Max indexes that achieve significant performance gains over the work in [9, 7]. We start by implementing and comparing Block-Max oriented algorithms based on the well-known Maxscore and WAND approaches. Then we study how to build better Block-Max index structures and design better index-traversal strategies, resulting in new algorithms that achieve a factor of 2 speed-up over the best results in [9] with acceptable space overheads. We also describe and evaluate a hierarchical algorithm for a new recursive Block-Max index structure.

    AB - Large web search engines use significant hardware and energy resources to process hundreds of millions of queries each day, and a lot of research has focused on how to improve query processing e ciency. One general class of optimizations called early termination techniques is used in all major engines, and essentially involves computing top results without an exhaustive traversal and scoring of all potentially relevant index entries. Recent work in [9, 7] proposed several early termination algorithms for disjunctive top-k query processing, based on a new augmented index structure called Block-Max Index that enables aggressive skipping in the index. In this paper, we build on this work by studying new algorithms and optimizations for Block-Max indexes that achieve significant performance gains over the work in [9, 7]. We start by implementing and comparing Block-Max oriented algorithms based on the well-known Maxscore and WAND approaches. Then we study how to build better Block-Max index structures and design better index-traversal strategies, resulting in new algorithms that achieve a factor of 2 speed-up over the best results in [9] with acceptable space overheads. We also describe and evaluate a hierarchical algorithm for a new recursive Block-Max index structure.

    KW - block-max inverted index

    KW - docid-oriented block-max index

    KW - early termination

    KW - top-k query processing

    UR - http://www.scopus.com/inward/record.url?scp=84874262067&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84874262067&partnerID=8YFLogxK

    U2 - 10.1145/2433396.2433412

    DO - 10.1145/2433396.2433412

    M3 - Conference contribution

    AN - SCOPUS:84874262067

    SN - 9781450318693

    SP - 113

    EP - 122

    BT - WSDM 2013 - Proceedings of the 6th ACM International Conference on Web Search and Data Mining

    ER -