Fast first-phase candidate generation for cascading rankers

Qi Wang, Constantinos Dimopoulos, Torsten Suel

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Current search engines use very complex ranking functions based on hundreds of features. While such functions return high-quality results, they create efficiency challenges as it is too costly to fully evaluate them on all documents in the union, or even intersection, of the query terms. To address this issue, search engines use a series of cascading rankers, starting with a very simple ranking function and then applying increasingly complex and expensive ranking functions on smaller and smaller sets of candidate results. Researchers have recently started studying several problems within this framework of query processing by cascading rankers; see, e.g., [5, 13, 17, 51]. We focus on one such problem, the design of the initial cascade. Thus, the goal is to very quickly identify a set of good candidate documents that should be passed to the second and further cascades. Previous work by Asadi and Lin [3, 5] showed that while a top-κ computation on either the union or intersection gives good results, a further optimization using a global document ordering based on spam scores leads to a significant reduction in quality. Our contribution is to propose an alternative framework that builds specialized single-term and pairwise index structures, and then during query time selectively accesses these structures based on a cost budget and a set of early termination techniques. Using an end-to-end evaluation with a complex machine-learned ranker, we show that our approach finds candidates about an order of magnitude faster than a conjunctive top-κ computation, while essentially matching the quality.

    Original languageEnglish (US)
    Title of host publicationSIGIR 2016 - Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval
    PublisherAssociation for Computing Machinery, Inc
    Pages295-304
    Number of pages10
    ISBN (Electronic)9781450342902
    DOIs
    StatePublished - Jul 7 2016
    Event39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016 - Pisa, Italy
    Duration: Jul 17 2016Jul 21 2016

    Other

    Other39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016
    CountryItaly
    CityPisa
    Period7/17/167/21/16

    Fingerprint

    Search engines
    Query processing
    Costs

    ASJC Scopus subject areas

    • Information Systems
    • Software

    Cite this

    Wang, Q., Dimopoulos, C., & Suel, T. (2016). Fast first-phase candidate generation for cascading rankers. In SIGIR 2016 - Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 295-304). Association for Computing Machinery, Inc. https://doi.org/10.1145/2911451.2911515

    Fast first-phase candidate generation for cascading rankers. / Wang, Qi; Dimopoulos, Constantinos; Suel, Torsten.

    SIGIR 2016 - Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, Inc, 2016. p. 295-304.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Wang, Q, Dimopoulos, C & Suel, T 2016, Fast first-phase candidate generation for cascading rankers. in SIGIR 2016 - Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, Inc, pp. 295-304, 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016, Pisa, Italy, 7/17/16. https://doi.org/10.1145/2911451.2911515
    Wang Q, Dimopoulos C, Suel T. Fast first-phase candidate generation for cascading rankers. In SIGIR 2016 - Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, Inc. 2016. p. 295-304 https://doi.org/10.1145/2911451.2911515
    Wang, Qi ; Dimopoulos, Constantinos ; Suel, Torsten. / Fast first-phase candidate generation for cascading rankers. SIGIR 2016 - Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. Association for Computing Machinery, Inc, 2016. pp. 295-304
    @inproceedings{8f9ed24e285c4153a3bfa4f4ebff64a1,
    title = "Fast first-phase candidate generation for cascading rankers",
    abstract = "Current search engines use very complex ranking functions based on hundreds of features. While such functions return high-quality results, they create efficiency challenges as it is too costly to fully evaluate them on all documents in the union, or even intersection, of the query terms. To address this issue, search engines use a series of cascading rankers, starting with a very simple ranking function and then applying increasingly complex and expensive ranking functions on smaller and smaller sets of candidate results. Researchers have recently started studying several problems within this framework of query processing by cascading rankers; see, e.g., [5, 13, 17, 51]. We focus on one such problem, the design of the initial cascade. Thus, the goal is to very quickly identify a set of good candidate documents that should be passed to the second and further cascades. Previous work by Asadi and Lin [3, 5] showed that while a top-κ computation on either the union or intersection gives good results, a further optimization using a global document ordering based on spam scores leads to a significant reduction in quality. Our contribution is to propose an alternative framework that builds specialized single-term and pairwise index structures, and then during query time selectively accesses these structures based on a cost budget and a set of early termination techniques. Using an end-to-end evaluation with a complex machine-learned ranker, we show that our approach finds candidates about an order of magnitude faster than a conjunctive top-κ computation, while essentially matching the quality.",
    author = "Qi Wang and Constantinos Dimopoulos and Torsten Suel",
    year = "2016",
    month = "7",
    day = "7",
    doi = "10.1145/2911451.2911515",
    language = "English (US)",
    pages = "295--304",
    booktitle = "SIGIR 2016 - Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval",
    publisher = "Association for Computing Machinery, Inc",

    }

    TY - GEN

    T1 - Fast first-phase candidate generation for cascading rankers

    AU - Wang, Qi

    AU - Dimopoulos, Constantinos

    AU - Suel, Torsten

    PY - 2016/7/7

    Y1 - 2016/7/7

    N2 - Current search engines use very complex ranking functions based on hundreds of features. While such functions return high-quality results, they create efficiency challenges as it is too costly to fully evaluate them on all documents in the union, or even intersection, of the query terms. To address this issue, search engines use a series of cascading rankers, starting with a very simple ranking function and then applying increasingly complex and expensive ranking functions on smaller and smaller sets of candidate results. Researchers have recently started studying several problems within this framework of query processing by cascading rankers; see, e.g., [5, 13, 17, 51]. We focus on one such problem, the design of the initial cascade. Thus, the goal is to very quickly identify a set of good candidate documents that should be passed to the second and further cascades. Previous work by Asadi and Lin [3, 5] showed that while a top-κ computation on either the union or intersection gives good results, a further optimization using a global document ordering based on spam scores leads to a significant reduction in quality. Our contribution is to propose an alternative framework that builds specialized single-term and pairwise index structures, and then during query time selectively accesses these structures based on a cost budget and a set of early termination techniques. Using an end-to-end evaluation with a complex machine-learned ranker, we show that our approach finds candidates about an order of magnitude faster than a conjunctive top-κ computation, while essentially matching the quality.

    AB - Current search engines use very complex ranking functions based on hundreds of features. While such functions return high-quality results, they create efficiency challenges as it is too costly to fully evaluate them on all documents in the union, or even intersection, of the query terms. To address this issue, search engines use a series of cascading rankers, starting with a very simple ranking function and then applying increasingly complex and expensive ranking functions on smaller and smaller sets of candidate results. Researchers have recently started studying several problems within this framework of query processing by cascading rankers; see, e.g., [5, 13, 17, 51]. We focus on one such problem, the design of the initial cascade. Thus, the goal is to very quickly identify a set of good candidate documents that should be passed to the second and further cascades. Previous work by Asadi and Lin [3, 5] showed that while a top-κ computation on either the union or intersection gives good results, a further optimization using a global document ordering based on spam scores leads to a significant reduction in quality. Our contribution is to propose an alternative framework that builds specialized single-term and pairwise index structures, and then during query time selectively accesses these structures based on a cost budget and a set of early termination techniques. Using an end-to-end evaluation with a complex machine-learned ranker, we show that our approach finds candidates about an order of magnitude faster than a conjunctive top-κ computation, while essentially matching the quality.

    UR - http://www.scopus.com/inward/record.url?scp=84980325826&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84980325826&partnerID=8YFLogxK

    U2 - 10.1145/2911451.2911515

    DO - 10.1145/2911451.2911515

    M3 - Conference contribution

    SP - 295

    EP - 304

    BT - SIGIR 2016 - Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval

    PB - Association for Computing Machinery, Inc

    ER -