Improved methods for static index pruning

Wei Jiang, Juan Rodriguez, Torsten Suel

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Static Index Pruning is a performance optimization technique for search engines that attempts to identify and remove index postings that are unlikely to lead to top results for typical user queries. The goal is to obtain a much smaller inverted index that can quickly return results that are (almost) as good as those for the unpruned index. We make two contributions: First, we improve on previous results for pruned index size through a careful analysis of both document and query distribution characteristics. We derive an initial model based on unigram probabilities that obtains gains over previous work in some cases, and a bigram-based approach that achieves some additional improvements. We also devise a simple method for generating query logs in the absence of real-life queries, useful in modeling top results. Our second contribution is to explore, and compare to previously proposed approaches that perform pruning based on how often documents or postings appeared in top positions in the past.

    Original languageEnglish (US)
    Title of host publicationProceedings - 2016 IEEE International Conference on Big Data, Big Data 2016
    PublisherInstitute of Electrical and Electronics Engineers Inc.
    Pages686-695
    Number of pages10
    ISBN (Electronic)9781467390040
    DOIs
    StatePublished - Feb 2 2017
    Event4th IEEE International Conference on Big Data, Big Data 2016 - Washington, United States
    Duration: Dec 5 2016Dec 8 2016

    Other

    Other4th IEEE International Conference on Big Data, Big Data 2016
    CountryUnited States
    CityWashington
    Period12/5/1612/8/16

      Fingerprint

    Keywords

    • index
    • search
    • static pruning

    ASJC Scopus subject areas

    • Computer Networks and Communications
    • Information Systems
    • Hardware and Architecture

    Cite this

    Jiang, W., Rodriguez, J., & Suel, T. (2017). Improved methods for static index pruning. In Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016 (pp. 686-695). [7840661] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2016.7840661