A candidate filtering mechanism for fast top-K query processing on modern CPUs

Constantinos Dimopoulos, Sergey Nepomnyachiy, Torsten Suel

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    A large amount of research has focused on faster methods for finding top-k results in large document collections, one of the main scalability challenges for web search engines. In this paper, we propose a method for accelerating such top-k queries that builds on and generalizes methods recently proposed by several groups of researchers based on Block-Max Indexes [15, 10, 13]. In particular, we describe a system that uses a new filtering mechanism, based on a combination of block maxima and bitmaps, that radically reduces the number of documents that have to be further evaluated. Our filtering mechanism exploits the SIMD processing capabilities of current microprocessors, and it is optimized through caching policies that select and store suitable filter structures based on properties of the query load. Our experimental evaluation shows that the mechanism results in very significant speed-ups for disjunctive top-k queries under several state-of-the-art algorithms, including a speed-up of more than a factor of 2 over the fastest previously known methods.

    Original languageEnglish (US)
    Title of host publicationSIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval
    Pages723-732
    Number of pages10
    DOIs
    StatePublished - 2013
    Event36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013 - Dublin, Ireland
    Duration: Jul 28 2013Aug 1 2013

    Other

    Other36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013
    CountryIreland
    CityDublin
    Period7/28/138/1/13

    Fingerprint

    Query processing
    Program processors
    Search engines
    Microprocessor chips
    Scalability
    Processing

    Keywords

    • Block-max inverted index
    • Candidate filtering mechanism
    • DocID-oriented block-max index
    • Early termination
    • Live area computation
    • Posting bitset
    • Top-k query processing

    ASJC Scopus subject areas

    • Computer Graphics and Computer-Aided Design
    • Information Systems

    Cite this

    Dimopoulos, C., Nepomnyachiy, S., & Suel, T. (2013). A candidate filtering mechanism for fast top-K query processing on modern CPUs. In SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 723-732) https://doi.org/10.1145/2484028.2484087

    A candidate filtering mechanism for fast top-K query processing on modern CPUs. / Dimopoulos, Constantinos; Nepomnyachiy, Sergey; Suel, Torsten.

    SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2013. p. 723-732.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Dimopoulos, C, Nepomnyachiy, S & Suel, T 2013, A candidate filtering mechanism for fast top-K query processing on modern CPUs. in SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 723-732, 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2013, Dublin, Ireland, 7/28/13. https://doi.org/10.1145/2484028.2484087
    Dimopoulos C, Nepomnyachiy S, Suel T. A candidate filtering mechanism for fast top-K query processing on modern CPUs. In SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2013. p. 723-732 https://doi.org/10.1145/2484028.2484087
    Dimopoulos, Constantinos ; Nepomnyachiy, Sergey ; Suel, Torsten. / A candidate filtering mechanism for fast top-K query processing on modern CPUs. SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2013. pp. 723-732
    @inproceedings{35fac6a493c448fdac1634e3e8ab5588,
    title = "A candidate filtering mechanism for fast top-K query processing on modern CPUs",
    abstract = "A large amount of research has focused on faster methods for finding top-k results in large document collections, one of the main scalability challenges for web search engines. In this paper, we propose a method for accelerating such top-k queries that builds on and generalizes methods recently proposed by several groups of researchers based on Block-Max Indexes [15, 10, 13]. In particular, we describe a system that uses a new filtering mechanism, based on a combination of block maxima and bitmaps, that radically reduces the number of documents that have to be further evaluated. Our filtering mechanism exploits the SIMD processing capabilities of current microprocessors, and it is optimized through caching policies that select and store suitable filter structures based on properties of the query load. Our experimental evaluation shows that the mechanism results in very significant speed-ups for disjunctive top-k queries under several state-of-the-art algorithms, including a speed-up of more than a factor of 2 over the fastest previously known methods.",
    keywords = "Block-max inverted index, Candidate filtering mechanism, DocID-oriented block-max index, Early termination, Live area computation, Posting bitset, Top-k query processing",
    author = "Constantinos Dimopoulos and Sergey Nepomnyachiy and Torsten Suel",
    year = "2013",
    doi = "10.1145/2484028.2484087",
    language = "English (US)",
    isbn = "9781450320344",
    pages = "723--732",
    booktitle = "SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval",

    }

    TY - GEN

    T1 - A candidate filtering mechanism for fast top-K query processing on modern CPUs

    AU - Dimopoulos, Constantinos

    AU - Nepomnyachiy, Sergey

    AU - Suel, Torsten

    PY - 2013

    Y1 - 2013

    N2 - A large amount of research has focused on faster methods for finding top-k results in large document collections, one of the main scalability challenges for web search engines. In this paper, we propose a method for accelerating such top-k queries that builds on and generalizes methods recently proposed by several groups of researchers based on Block-Max Indexes [15, 10, 13]. In particular, we describe a system that uses a new filtering mechanism, based on a combination of block maxima and bitmaps, that radically reduces the number of documents that have to be further evaluated. Our filtering mechanism exploits the SIMD processing capabilities of current microprocessors, and it is optimized through caching policies that select and store suitable filter structures based on properties of the query load. Our experimental evaluation shows that the mechanism results in very significant speed-ups for disjunctive top-k queries under several state-of-the-art algorithms, including a speed-up of more than a factor of 2 over the fastest previously known methods.

    AB - A large amount of research has focused on faster methods for finding top-k results in large document collections, one of the main scalability challenges for web search engines. In this paper, we propose a method for accelerating such top-k queries that builds on and generalizes methods recently proposed by several groups of researchers based on Block-Max Indexes [15, 10, 13]. In particular, we describe a system that uses a new filtering mechanism, based on a combination of block maxima and bitmaps, that radically reduces the number of documents that have to be further evaluated. Our filtering mechanism exploits the SIMD processing capabilities of current microprocessors, and it is optimized through caching policies that select and store suitable filter structures based on properties of the query load. Our experimental evaluation shows that the mechanism results in very significant speed-ups for disjunctive top-k queries under several state-of-the-art algorithms, including a speed-up of more than a factor of 2 over the fastest previously known methods.

    KW - Block-max inverted index

    KW - Candidate filtering mechanism

    KW - DocID-oriented block-max index

    KW - Early termination

    KW - Live area computation

    KW - Posting bitset

    KW - Top-k query processing

    UR - http://www.scopus.com/inward/record.url?scp=84883116336&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84883116336&partnerID=8YFLogxK

    U2 - 10.1145/2484028.2484087

    DO - 10.1145/2484028.2484087

    M3 - Conference contribution

    SN - 9781450320344

    SP - 723

    EP - 732

    BT - SIGIR 2013 - Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval

    ER -