Efficient term proximity search with term-pair indexes

Hao Yan, Shuming Shi, Fan Zhang, Torsten Suel, Ji Rong Wen

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    There has been a large amount of research on early termination techniques in web search and information retrieval. Such techniques return the top-k documents without scanning and evaluating the full inverted lists of the query terms. Thus, they can greatly improve query processing efficiency. However, only a limited amount of efficient top-k processing work considers the impact of term proximity, i.e., the distance between term occurrences in a document, which has recently been integrated into a number of retrieval models to improve effectiveness. In this paper, we propose new early termination techniques for efficient query processing for the case where term proximity is integrated into the retrieval model. We propose new index structures based on a term-pair index, and study new document retrieval strategies on the resulting indexes. We perform a detailed experimental evaluation on our new techniques and compare them with the existing approaches. Experimental results on large-scale data sets show that our techniques can significantly improve the efficiency of query processing.

    Original languageEnglish (US)
    Title of host publicationCIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops
    Pages1229-1238
    Number of pages10
    DOIs
    StatePublished - 2010
    Event19th International Conference on Information and Knowledge Management and Co-located Workshops, CIKM'10 - Toronto, ON, Canada
    Duration: Oct 26 2010Oct 30 2010

    Other

    Other19th International Conference on Information and Knowledge Management and Co-located Workshops, CIKM'10
    CountryCanada
    CityToronto, ON
    Period10/26/1010/30/10

    Fingerprint

    Proximity
    Query processing
    Termination
    Integrated
    Top-k
    Web search
    Evaluation
    Query
    Information retrieval

    Keywords

    • Document structure
    • Term proximity
    • Term-pair index
    • Top-k

    ASJC Scopus subject areas

    • Business, Management and Accounting(all)
    • Decision Sciences(all)

    Cite this

    Yan, H., Shi, S., Zhang, F., Suel, T., & Wen, J. R. (2010). Efficient term proximity search with term-pair indexes. In CIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops (pp. 1229-1238) https://doi.org/10.1145/1871437.1871593

    Efficient term proximity search with term-pair indexes. / Yan, Hao; Shi, Shuming; Zhang, Fan; Suel, Torsten; Wen, Ji Rong.

    CIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops. 2010. p. 1229-1238.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Yan, H, Shi, S, Zhang, F, Suel, T & Wen, JR 2010, Efficient term proximity search with term-pair indexes. in CIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops. pp. 1229-1238, 19th International Conference on Information and Knowledge Management and Co-located Workshops, CIKM'10, Toronto, ON, Canada, 10/26/10. https://doi.org/10.1145/1871437.1871593
    Yan H, Shi S, Zhang F, Suel T, Wen JR. Efficient term proximity search with term-pair indexes. In CIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops. 2010. p. 1229-1238 https://doi.org/10.1145/1871437.1871593
    Yan, Hao ; Shi, Shuming ; Zhang, Fan ; Suel, Torsten ; Wen, Ji Rong. / Efficient term proximity search with term-pair indexes. CIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops. 2010. pp. 1229-1238
    @inproceedings{64ab764e843b4b278a77a8d2385c5fb9,
    title = "Efficient term proximity search with term-pair indexes",
    abstract = "There has been a large amount of research on early termination techniques in web search and information retrieval. Such techniques return the top-k documents without scanning and evaluating the full inverted lists of the query terms. Thus, they can greatly improve query processing efficiency. However, only a limited amount of efficient top-k processing work considers the impact of term proximity, i.e., the distance between term occurrences in a document, which has recently been integrated into a number of retrieval models to improve effectiveness. In this paper, we propose new early termination techniques for efficient query processing for the case where term proximity is integrated into the retrieval model. We propose new index structures based on a term-pair index, and study new document retrieval strategies on the resulting indexes. We perform a detailed experimental evaluation on our new techniques and compare them with the existing approaches. Experimental results on large-scale data sets show that our techniques can significantly improve the efficiency of query processing.",
    keywords = "Document structure, Term proximity, Term-pair index, Top-k",
    author = "Hao Yan and Shuming Shi and Fan Zhang and Torsten Suel and Wen, {Ji Rong}",
    year = "2010",
    doi = "10.1145/1871437.1871593",
    language = "English (US)",
    isbn = "9781450300995",
    pages = "1229--1238",
    booktitle = "CIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops",

    }

    TY - GEN

    T1 - Efficient term proximity search with term-pair indexes

    AU - Yan, Hao

    AU - Shi, Shuming

    AU - Zhang, Fan

    AU - Suel, Torsten

    AU - Wen, Ji Rong

    PY - 2010

    Y1 - 2010

    N2 - There has been a large amount of research on early termination techniques in web search and information retrieval. Such techniques return the top-k documents without scanning and evaluating the full inverted lists of the query terms. Thus, they can greatly improve query processing efficiency. However, only a limited amount of efficient top-k processing work considers the impact of term proximity, i.e., the distance between term occurrences in a document, which has recently been integrated into a number of retrieval models to improve effectiveness. In this paper, we propose new early termination techniques for efficient query processing for the case where term proximity is integrated into the retrieval model. We propose new index structures based on a term-pair index, and study new document retrieval strategies on the resulting indexes. We perform a detailed experimental evaluation on our new techniques and compare them with the existing approaches. Experimental results on large-scale data sets show that our techniques can significantly improve the efficiency of query processing.

    AB - There has been a large amount of research on early termination techniques in web search and information retrieval. Such techniques return the top-k documents without scanning and evaluating the full inverted lists of the query terms. Thus, they can greatly improve query processing efficiency. However, only a limited amount of efficient top-k processing work considers the impact of term proximity, i.e., the distance between term occurrences in a document, which has recently been integrated into a number of retrieval models to improve effectiveness. In this paper, we propose new early termination techniques for efficient query processing for the case where term proximity is integrated into the retrieval model. We propose new index structures based on a term-pair index, and study new document retrieval strategies on the resulting indexes. We perform a detailed experimental evaluation on our new techniques and compare them with the existing approaches. Experimental results on large-scale data sets show that our techniques can significantly improve the efficiency of query processing.

    KW - Document structure

    KW - Term proximity

    KW - Term-pair index

    KW - Top-k

    UR - http://www.scopus.com/inward/record.url?scp=78651335518&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=78651335518&partnerID=8YFLogxK

    U2 - 10.1145/1871437.1871593

    DO - 10.1145/1871437.1871593

    M3 - Conference contribution

    AN - SCOPUS:78651335518

    SN - 9781450300995

    SP - 1229

    EP - 1238

    BT - CIKM'10 - Proceedings of the 19th International Conference on Information and Knowledge Management and Co-located Workshops

    ER -