Compressing inverted indexes with recursive graph bisection: A reproducibility study

Joel Mackenzie, Antonio Mallia, Matthias Petri, J. Shane Culpepper, Torsten Suel

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Document reordering is an important but often overlooked preprocessing stage in index construction. Reordering document identifiers in graphs and inverted indexes has been shown to reduce storage costs and improve processing efficiency in the resulting indexes. However, surprisingly few document reordering algorithms are publicly available despite their importance. A new reordering algorithm derived from recursive graph bisection was recently proposed by Dhulipala et al., and shown to be highly effective and efficient when compared against other state-of-the-art reordering strategies. In this work, we present a reproducibility study of this new algorithm. We describe the implementation challenges encountered, and explore the performance characteristics of our clean-room reimplementation. We show that we are able to successfully reproduce the core results of the original paper, and show that the algorithm generalizes to other collections and indexing frameworks. Furthermore, we make our implementation publicly available to help promote further research in this space.

    Original languageEnglish (US)
    Title of host publicationAdvances in Information Retrieval - 41st European Conference on IR Research, ECIR 2019, Proceedings
    EditorsNorbert Fuhr, Leif Azzopardi, Benno Stein, Claudia Hauff, Philipp Mayr, Djoerd Hiemstra
    PublisherSpringer-Verlag
    Pages339-352
    Number of pages14
    ISBN (Print)9783030157111
    DOIs
    StatePublished - Jan 1 2019
    Event41st European Conference on Information Retrieval, ECIR 2019 - Cologne, Germany
    Duration: Apr 14 2019Apr 18 2019

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume11437 LNCS
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference41st European Conference on Information Retrieval, ECIR 2019
    CountryGermany
    CityCologne
    Period4/14/194/18/19

    Fingerprint

    Bisection
    Reordering
    Reproducibility
    Graph in graph theory
    Clean rooms
    Indexing
    Preprocessing
    Processing
    Generalise
    Costs

    Keywords

    • Compression
    • Efficiency
    • Reordering
    • Reproducibility

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • Computer Science(all)

    Cite this

    Mackenzie, J., Mallia, A., Petri, M., Culpepper, J. S., & Suel, T. (2019). Compressing inverted indexes with recursive graph bisection: A reproducibility study. In N. Fuhr, L. Azzopardi, B. Stein, C. Hauff, P. Mayr, & D. Hiemstra (Eds.), Advances in Information Retrieval - 41st European Conference on IR Research, ECIR 2019, Proceedings (pp. 339-352). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11437 LNCS). Springer-Verlag. https://doi.org/10.1007/978-3-030-15712-8_22

    Compressing inverted indexes with recursive graph bisection : A reproducibility study. / Mackenzie, Joel; Mallia, Antonio; Petri, Matthias; Culpepper, J. Shane; Suel, Torsten.

    Advances in Information Retrieval - 41st European Conference on IR Research, ECIR 2019, Proceedings. ed. / Norbert Fuhr; Leif Azzopardi; Benno Stein; Claudia Hauff; Philipp Mayr; Djoerd Hiemstra. Springer-Verlag, 2019. p. 339-352 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11437 LNCS).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Mackenzie, J, Mallia, A, Petri, M, Culpepper, JS & Suel, T 2019, Compressing inverted indexes with recursive graph bisection: A reproducibility study. in N Fuhr, L Azzopardi, B Stein, C Hauff, P Mayr & D Hiemstra (eds), Advances in Information Retrieval - 41st European Conference on IR Research, ECIR 2019, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11437 LNCS, Springer-Verlag, pp. 339-352, 41st European Conference on Information Retrieval, ECIR 2019, Cologne, Germany, 4/14/19. https://doi.org/10.1007/978-3-030-15712-8_22
    Mackenzie J, Mallia A, Petri M, Culpepper JS, Suel T. Compressing inverted indexes with recursive graph bisection: A reproducibility study. In Fuhr N, Azzopardi L, Stein B, Hauff C, Mayr P, Hiemstra D, editors, Advances in Information Retrieval - 41st European Conference on IR Research, ECIR 2019, Proceedings. Springer-Verlag. 2019. p. 339-352. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-030-15712-8_22
    Mackenzie, Joel ; Mallia, Antonio ; Petri, Matthias ; Culpepper, J. Shane ; Suel, Torsten. / Compressing inverted indexes with recursive graph bisection : A reproducibility study. Advances in Information Retrieval - 41st European Conference on IR Research, ECIR 2019, Proceedings. editor / Norbert Fuhr ; Leif Azzopardi ; Benno Stein ; Claudia Hauff ; Philipp Mayr ; Djoerd Hiemstra. Springer-Verlag, 2019. pp. 339-352 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
    @inproceedings{ba1faeda6a1245c48065d464f20afee2,
    title = "Compressing inverted indexes with recursive graph bisection: A reproducibility study",
    abstract = "Document reordering is an important but often overlooked preprocessing stage in index construction. Reordering document identifiers in graphs and inverted indexes has been shown to reduce storage costs and improve processing efficiency in the resulting indexes. However, surprisingly few document reordering algorithms are publicly available despite their importance. A new reordering algorithm derived from recursive graph bisection was recently proposed by Dhulipala et al., and shown to be highly effective and efficient when compared against other state-of-the-art reordering strategies. In this work, we present a reproducibility study of this new algorithm. We describe the implementation challenges encountered, and explore the performance characteristics of our clean-room reimplementation. We show that we are able to successfully reproduce the core results of the original paper, and show that the algorithm generalizes to other collections and indexing frameworks. Furthermore, we make our implementation publicly available to help promote further research in this space.",
    keywords = "Compression, Efficiency, Reordering, Reproducibility",
    author = "Joel Mackenzie and Antonio Mallia and Matthias Petri and Culpepper, {J. Shane} and Torsten Suel",
    year = "2019",
    month = "1",
    day = "1",
    doi = "10.1007/978-3-030-15712-8_22",
    language = "English (US)",
    isbn = "9783030157111",
    series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
    publisher = "Springer-Verlag",
    pages = "339--352",
    editor = "Norbert Fuhr and Leif Azzopardi and Benno Stein and Claudia Hauff and Philipp Mayr and Djoerd Hiemstra",
    booktitle = "Advances in Information Retrieval - 41st European Conference on IR Research, ECIR 2019, Proceedings",

    }

    TY - GEN

    T1 - Compressing inverted indexes with recursive graph bisection

    T2 - A reproducibility study

    AU - Mackenzie, Joel

    AU - Mallia, Antonio

    AU - Petri, Matthias

    AU - Culpepper, J. Shane

    AU - Suel, Torsten

    PY - 2019/1/1

    Y1 - 2019/1/1

    N2 - Document reordering is an important but often overlooked preprocessing stage in index construction. Reordering document identifiers in graphs and inverted indexes has been shown to reduce storage costs and improve processing efficiency in the resulting indexes. However, surprisingly few document reordering algorithms are publicly available despite their importance. A new reordering algorithm derived from recursive graph bisection was recently proposed by Dhulipala et al., and shown to be highly effective and efficient when compared against other state-of-the-art reordering strategies. In this work, we present a reproducibility study of this new algorithm. We describe the implementation challenges encountered, and explore the performance characteristics of our clean-room reimplementation. We show that we are able to successfully reproduce the core results of the original paper, and show that the algorithm generalizes to other collections and indexing frameworks. Furthermore, we make our implementation publicly available to help promote further research in this space.

    AB - Document reordering is an important but often overlooked preprocessing stage in index construction. Reordering document identifiers in graphs and inverted indexes has been shown to reduce storage costs and improve processing efficiency in the resulting indexes. However, surprisingly few document reordering algorithms are publicly available despite their importance. A new reordering algorithm derived from recursive graph bisection was recently proposed by Dhulipala et al., and shown to be highly effective and efficient when compared against other state-of-the-art reordering strategies. In this work, we present a reproducibility study of this new algorithm. We describe the implementation challenges encountered, and explore the performance characteristics of our clean-room reimplementation. We show that we are able to successfully reproduce the core results of the original paper, and show that the algorithm generalizes to other collections and indexing frameworks. Furthermore, we make our implementation publicly available to help promote further research in this space.

    KW - Compression

    KW - Efficiency

    KW - Reordering

    KW - Reproducibility

    UR - http://www.scopus.com/inward/record.url?scp=85064881304&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85064881304&partnerID=8YFLogxK

    U2 - 10.1007/978-3-030-15712-8_22

    DO - 10.1007/978-3-030-15712-8_22

    M3 - Conference contribution

    AN - SCOPUS:85064881304

    SN - 9783030157111

    T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    SP - 339

    EP - 352

    BT - Advances in Information Retrieval - 41st European Conference on IR Research, ECIR 2019, Proceedings

    A2 - Fuhr, Norbert

    A2 - Azzopardi, Leif

    A2 - Stein, Benno

    A2 - Hauff, Claudia

    A2 - Mayr, Philipp

    A2 - Hiemstra, Djoerd

    PB - Springer-Verlag

    ER -