Optimized inverted list assignment in distributed search engine architectures

Jiangong Zhang, Torsten Suei

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    We study efficient query processing in distributed web search engines with global index organization. The main performance bottleneck in this case is due to the large amount of index data that is exchanged between nodes during the processing of a query, and previous work has proposed several techniques for significantly reducing this cost. We describe an approach that provides substantial additional improvement over previous techniques. In particular, we analyze search engine query traces in order to optimize the assignment of index data to the nodes in the system, such that terms frequently occurring together in queries are also often collocated on the same node. Our experiments show that in return for a modest factor increase in storage space, we can achieve a reduction in communication cost of an order of magnitude over the previous best techniques.

    Original languageEnglish (US)
    Title of host publicationProceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM
    DOIs
    StatePublished - 2007
    Event21st International Parallel and Distributed Processing Symposium, IPDPS 2007 - Long Beach, CA, United States
    Duration: Mar 26 2007Mar 30 2007

    Other

    Other21st International Parallel and Distributed Processing Symposium, IPDPS 2007
    CountryUnited States
    CityLong Beach, CA
    Period3/26/073/30/07

    Fingerprint

    Search engines
    Search Engine
    Assignment
    Query
    Query processing
    Vertex of a graph
    Costs
    Web Search
    Communication Cost
    Query Processing
    Communication
    Processing
    Trace
    Optimise
    Experiments
    Term
    Experiment
    Architecture

    ASJC Scopus subject areas

    • Hardware and Architecture
    • Software
    • Mathematics(all)

    Cite this

    Zhang, J., & Suei, T. (2007). Optimized inverted list assignment in distributed search engine architectures. In Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM [4227959] https://doi.org/10.1109/IPDPS.2007.370231

    Optimized inverted list assignment in distributed search engine architectures. / Zhang, Jiangong; Suei, Torsten.

    Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM. 2007. 4227959.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Zhang, J & Suei, T 2007, Optimized inverted list assignment in distributed search engine architectures. in Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM., 4227959, 21st International Parallel and Distributed Processing Symposium, IPDPS 2007, Long Beach, CA, United States, 3/26/07. https://doi.org/10.1109/IPDPS.2007.370231
    Zhang J, Suei T. Optimized inverted list assignment in distributed search engine architectures. In Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM. 2007. 4227959 https://doi.org/10.1109/IPDPS.2007.370231
    Zhang, Jiangong ; Suei, Torsten. / Optimized inverted list assignment in distributed search engine architectures. Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM. 2007.
    @inproceedings{6193e52b9d6e44fcad8ed8accb019cda,
    title = "Optimized inverted list assignment in distributed search engine architectures",
    abstract = "We study efficient query processing in distributed web search engines with global index organization. The main performance bottleneck in this case is due to the large amount of index data that is exchanged between nodes during the processing of a query, and previous work has proposed several techniques for significantly reducing this cost. We describe an approach that provides substantial additional improvement over previous techniques. In particular, we analyze search engine query traces in order to optimize the assignment of index data to the nodes in the system, such that terms frequently occurring together in queries are also often collocated on the same node. Our experiments show that in return for a modest factor increase in storage space, we can achieve a reduction in communication cost of an order of magnitude over the previous best techniques.",
    author = "Jiangong Zhang and Torsten Suei",
    year = "2007",
    doi = "10.1109/IPDPS.2007.370231",
    language = "English (US)",
    isbn = "1424409101",
    booktitle = "Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM",

    }

    TY - GEN

    T1 - Optimized inverted list assignment in distributed search engine architectures

    AU - Zhang, Jiangong

    AU - Suei, Torsten

    PY - 2007

    Y1 - 2007

    N2 - We study efficient query processing in distributed web search engines with global index organization. The main performance bottleneck in this case is due to the large amount of index data that is exchanged between nodes during the processing of a query, and previous work has proposed several techniques for significantly reducing this cost. We describe an approach that provides substantial additional improvement over previous techniques. In particular, we analyze search engine query traces in order to optimize the assignment of index data to the nodes in the system, such that terms frequently occurring together in queries are also often collocated on the same node. Our experiments show that in return for a modest factor increase in storage space, we can achieve a reduction in communication cost of an order of magnitude over the previous best techniques.

    AB - We study efficient query processing in distributed web search engines with global index organization. The main performance bottleneck in this case is due to the large amount of index data that is exchanged between nodes during the processing of a query, and previous work has proposed several techniques for significantly reducing this cost. We describe an approach that provides substantial additional improvement over previous techniques. In particular, we analyze search engine query traces in order to optimize the assignment of index data to the nodes in the system, such that terms frequently occurring together in queries are also often collocated on the same node. Our experiments show that in return for a modest factor increase in storage space, we can achieve a reduction in communication cost of an order of magnitude over the previous best techniques.

    UR - http://www.scopus.com/inward/record.url?scp=34548721472&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=34548721472&partnerID=8YFLogxK

    U2 - 10.1109/IPDPS.2007.370231

    DO - 10.1109/IPDPS.2007.370231

    M3 - Conference contribution

    AN - SCOPUS:34548721472

    SN - 1424409101

    SN - 9781424409105

    BT - Proceedings - 21st International Parallel and Distributed Processing Symposium, IPDPS 2007; Abstracts and CD-ROM

    ER -