Top-k aggregation using intersections of ranked inputs

Ravi Kumar, Kunal Punera, Torsten Suel, Sergei Vassilvitskii

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    There has been considerable past work on efficiently computing top k objects by aggregating information from multiple ranked lists of these objects. An important instance of this problem is query processing in search engines: One has to combine information from several different posting lists (rankings) of web pages (objects) to obtain the top k web pages to answer user queries. Two particularly well-studied approaches to achieve efficiency in top-k aggregation include early-termination algorithms (e.g., TA and NRA) and pre-aggregation of some of the input lists. However, there has been little work on a rigorous treatment of combining these approaches. We generalize the TA and NRA algorithms to the case when pre-aggregated intersection lists are available in addition to the original lists. We show that our versions of TA and NRA continue to remain "instance optimal," a very strong optimality notion that is a highlight of the original TA and NRA algorithms. Using an index of millions of web pages and real-world search engine queries, we empirically characterize the performance gains offered by our new algorithms. We show that the practical benefits of intersection lists can be fully realized only with an early-termination algorithm.

    Original languageEnglish (US)
    Title of host publicationProceedings of the 2nd ACM International Conference on Web Search and Data Mining, WSDM'09
    Pages222-231
    Number of pages10
    DOIs
    StatePublished - 2009
    Event2nd ACM International Conference on Web Search and Data Mining, WSDM'09 - Barcelona, Spain
    Duration: Feb 9 2009Feb 12 2009

    Other

    Other2nd ACM International Conference on Web Search and Data Mining, WSDM'09
    CountrySpain
    CityBarcelona
    Period2/9/092/12/09

    Fingerprint

    Agglomeration
    Websites
    Search engines
    Query processing

    Keywords

    • Early-termination
    • Intersections
    • NRA
    • TA

    ASJC Scopus subject areas

    • Computer Networks and Communications
    • Software

    Cite this

    Kumar, R., Punera, K., Suel, T., & Vassilvitskii, S. (2009). Top-k aggregation using intersections of ranked inputs. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, WSDM'09 (pp. 222-231) https://doi.org/10.1145/1498759.1498830

    Top-k aggregation using intersections of ranked inputs. / Kumar, Ravi; Punera, Kunal; Suel, Torsten; Vassilvitskii, Sergei.

    Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, WSDM'09. 2009. p. 222-231.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Kumar, R, Punera, K, Suel, T & Vassilvitskii, S 2009, Top-k aggregation using intersections of ranked inputs. in Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, WSDM'09. pp. 222-231, 2nd ACM International Conference on Web Search and Data Mining, WSDM'09, Barcelona, Spain, 2/9/09. https://doi.org/10.1145/1498759.1498830
    Kumar R, Punera K, Suel T, Vassilvitskii S. Top-k aggregation using intersections of ranked inputs. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, WSDM'09. 2009. p. 222-231 https://doi.org/10.1145/1498759.1498830
    Kumar, Ravi ; Punera, Kunal ; Suel, Torsten ; Vassilvitskii, Sergei. / Top-k aggregation using intersections of ranked inputs. Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, WSDM'09. 2009. pp. 222-231
    @inproceedings{5c78e041e3ab461aa8a003e765797c64,
    title = "Top-k aggregation using intersections of ranked inputs",
    abstract = "There has been considerable past work on efficiently computing top k objects by aggregating information from multiple ranked lists of these objects. An important instance of this problem is query processing in search engines: One has to combine information from several different posting lists (rankings) of web pages (objects) to obtain the top k web pages to answer user queries. Two particularly well-studied approaches to achieve efficiency in top-k aggregation include early-termination algorithms (e.g., TA and NRA) and pre-aggregation of some of the input lists. However, there has been little work on a rigorous treatment of combining these approaches. We generalize the TA and NRA algorithms to the case when pre-aggregated intersection lists are available in addition to the original lists. We show that our versions of TA and NRA continue to remain {"}instance optimal,{"} a very strong optimality notion that is a highlight of the original TA and NRA algorithms. Using an index of millions of web pages and real-world search engine queries, we empirically characterize the performance gains offered by our new algorithms. We show that the practical benefits of intersection lists can be fully realized only with an early-termination algorithm.",
    keywords = "Early-termination, Intersections, NRA, TA",
    author = "Ravi Kumar and Kunal Punera and Torsten Suel and Sergei Vassilvitskii",
    year = "2009",
    doi = "10.1145/1498759.1498830",
    language = "English (US)",
    isbn = "9781605583907",
    pages = "222--231",
    booktitle = "Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, WSDM'09",

    }

    TY - GEN

    T1 - Top-k aggregation using intersections of ranked inputs

    AU - Kumar, Ravi

    AU - Punera, Kunal

    AU - Suel, Torsten

    AU - Vassilvitskii, Sergei

    PY - 2009

    Y1 - 2009

    N2 - There has been considerable past work on efficiently computing top k objects by aggregating information from multiple ranked lists of these objects. An important instance of this problem is query processing in search engines: One has to combine information from several different posting lists (rankings) of web pages (objects) to obtain the top k web pages to answer user queries. Two particularly well-studied approaches to achieve efficiency in top-k aggregation include early-termination algorithms (e.g., TA and NRA) and pre-aggregation of some of the input lists. However, there has been little work on a rigorous treatment of combining these approaches. We generalize the TA and NRA algorithms to the case when pre-aggregated intersection lists are available in addition to the original lists. We show that our versions of TA and NRA continue to remain "instance optimal," a very strong optimality notion that is a highlight of the original TA and NRA algorithms. Using an index of millions of web pages and real-world search engine queries, we empirically characterize the performance gains offered by our new algorithms. We show that the practical benefits of intersection lists can be fully realized only with an early-termination algorithm.

    AB - There has been considerable past work on efficiently computing top k objects by aggregating information from multiple ranked lists of these objects. An important instance of this problem is query processing in search engines: One has to combine information from several different posting lists (rankings) of web pages (objects) to obtain the top k web pages to answer user queries. Two particularly well-studied approaches to achieve efficiency in top-k aggregation include early-termination algorithms (e.g., TA and NRA) and pre-aggregation of some of the input lists. However, there has been little work on a rigorous treatment of combining these approaches. We generalize the TA and NRA algorithms to the case when pre-aggregated intersection lists are available in addition to the original lists. We show that our versions of TA and NRA continue to remain "instance optimal," a very strong optimality notion that is a highlight of the original TA and NRA algorithms. Using an index of millions of web pages and real-world search engine queries, we empirically characterize the performance gains offered by our new algorithms. We show that the practical benefits of intersection lists can be fully realized only with an early-termination algorithm.

    KW - Early-termination

    KW - Intersections

    KW - NRA

    KW - TA

    UR - http://www.scopus.com/inward/record.url?scp=70349151328&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=70349151328&partnerID=8YFLogxK

    U2 - 10.1145/1498759.1498830

    DO - 10.1145/1498759.1498830

    M3 - Conference contribution

    SN - 9781605583907

    SP - 222

    EP - 231

    BT - Proceedings of the 2nd ACM International Conference on Web Search and Data Mining, WSDM'09

    ER -