Compressing and indexing labeled trees, with applications

Paolo Ferragina, Fabrizio Luccio, Giovanni Manzini, Shanmugavelayutham Muthukrishnan

    Research output: Contribution to journalArticle

    Abstract

    Consider an ordered, static tree T where each node has a label from alphabet σ. Tree T may be of arbitrary degree and shape. Our goal is designing a compressed storage scheme of T that supports basic navigational operations among the immediate neighbors of a node (i.e. parent, ith child, or any child with some label, . . .) as well as more sophisticated path-based search operations over its labeled structure. We present a novel approach to this problem by designing what we call the XBW-transform of the tree in the spirit of the well-knownBurrows-Wheeler transform for strings [1994]. TheXBW-transform uses path-sorting to linearize the labeled tree T into two coordinated arrays, one capturing the structure and the other the labels. For the first time, by using the properties of the XBW-transform, our compressed indexes go beyond the information-theoretic lower bound, and support navigational and pathsearch operations over labeled trees within (near-)optimal time bounds and entropy-bounded space. Our XBW-transform is simple and likely to spur new results in the theory of tree compression and indexing, as well as interesting application contexts. As an example, we use the XBW-transform to design and implement a compressed index for XML documents whose compression ratio is signifi-cantly better than the one achievable by state-of-the-art tools, and its query time performance is order of magnitudes faster.

    Original languageEnglish (US)
    Article number4
    JournalJournal of the ACM
    Volume57
    Issue number1
    DOIs
    StatePublished - Nov 1 2009

    Fingerprint

    Mathematical transformations
    Labels
    Sorting
    XML
    Entropy

    Keywords

    • Burrows-Wheeler transform
    • Labeled tree compression
    • Labeled tree indexing
    • Path searching
    • Tree navigation
    • XML compression
    • XML indexing

    ASJC Scopus subject areas

    • Software
    • Control and Systems Engineering
    • Information Systems
    • Hardware and Architecture
    • Artificial Intelligence

    Cite this

    Ferragina, P., Luccio, F., Manzini, G., & Muthukrishnan, S. (2009). Compressing and indexing labeled trees, with applications. Journal of the ACM, 57(1), [4]. https://doi.org/10.1145/1613676.1613680

    Compressing and indexing labeled trees, with applications. / Ferragina, Paolo; Luccio, Fabrizio; Manzini, Giovanni; Muthukrishnan, Shanmugavelayutham.

    In: Journal of the ACM, Vol. 57, No. 1, 4, 01.11.2009.

    Research output: Contribution to journalArticle

    Ferragina, P, Luccio, F, Manzini, G & Muthukrishnan, S 2009, 'Compressing and indexing labeled trees, with applications', Journal of the ACM, vol. 57, no. 1, 4. https://doi.org/10.1145/1613676.1613680
    Ferragina P, Luccio F, Manzini G, Muthukrishnan S. Compressing and indexing labeled trees, with applications. Journal of the ACM. 2009 Nov 1;57(1). 4. https://doi.org/10.1145/1613676.1613680
    Ferragina, Paolo ; Luccio, Fabrizio ; Manzini, Giovanni ; Muthukrishnan, Shanmugavelayutham. / Compressing and indexing labeled trees, with applications. In: Journal of the ACM. 2009 ; Vol. 57, No. 1.
    @article{0afb4436edf94e81a1332b3877a4aede,
    title = "Compressing and indexing labeled trees, with applications",
    abstract = "Consider an ordered, static tree T where each node has a label from alphabet σ. Tree T may be of arbitrary degree and shape. Our goal is designing a compressed storage scheme of T that supports basic navigational operations among the immediate neighbors of a node (i.e. parent, ith child, or any child with some label, . . .) as well as more sophisticated path-based search operations over its labeled structure. We present a novel approach to this problem by designing what we call the XBW-transform of the tree in the spirit of the well-knownBurrows-Wheeler transform for strings [1994]. TheXBW-transform uses path-sorting to linearize the labeled tree T into two coordinated arrays, one capturing the structure and the other the labels. For the first time, by using the properties of the XBW-transform, our compressed indexes go beyond the information-theoretic lower bound, and support navigational and pathsearch operations over labeled trees within (near-)optimal time bounds and entropy-bounded space. Our XBW-transform is simple and likely to spur new results in the theory of tree compression and indexing, as well as interesting application contexts. As an example, we use the XBW-transform to design and implement a compressed index for XML documents whose compression ratio is signifi-cantly better than the one achievable by state-of-the-art tools, and its query time performance is order of magnitudes faster.",
    keywords = "Burrows-Wheeler transform, Labeled tree compression, Labeled tree indexing, Path searching, Tree navigation, XML compression, XML indexing",
    author = "Paolo Ferragina and Fabrizio Luccio and Giovanni Manzini and Shanmugavelayutham Muthukrishnan",
    year = "2009",
    month = "11",
    day = "1",
    doi = "10.1145/1613676.1613680",
    language = "English (US)",
    volume = "57",
    journal = "Journal of the ACM",
    issn = "0004-5411",
    publisher = "Association for Computing Machinery (ACM)",
    number = "1",

    }

    TY - JOUR

    T1 - Compressing and indexing labeled trees, with applications

    AU - Ferragina, Paolo

    AU - Luccio, Fabrizio

    AU - Manzini, Giovanni

    AU - Muthukrishnan, Shanmugavelayutham

    PY - 2009/11/1

    Y1 - 2009/11/1

    N2 - Consider an ordered, static tree T where each node has a label from alphabet σ. Tree T may be of arbitrary degree and shape. Our goal is designing a compressed storage scheme of T that supports basic navigational operations among the immediate neighbors of a node (i.e. parent, ith child, or any child with some label, . . .) as well as more sophisticated path-based search operations over its labeled structure. We present a novel approach to this problem by designing what we call the XBW-transform of the tree in the spirit of the well-knownBurrows-Wheeler transform for strings [1994]. TheXBW-transform uses path-sorting to linearize the labeled tree T into two coordinated arrays, one capturing the structure and the other the labels. For the first time, by using the properties of the XBW-transform, our compressed indexes go beyond the information-theoretic lower bound, and support navigational and pathsearch operations over labeled trees within (near-)optimal time bounds and entropy-bounded space. Our XBW-transform is simple and likely to spur new results in the theory of tree compression and indexing, as well as interesting application contexts. As an example, we use the XBW-transform to design and implement a compressed index for XML documents whose compression ratio is signifi-cantly better than the one achievable by state-of-the-art tools, and its query time performance is order of magnitudes faster.

    AB - Consider an ordered, static tree T where each node has a label from alphabet σ. Tree T may be of arbitrary degree and shape. Our goal is designing a compressed storage scheme of T that supports basic navigational operations among the immediate neighbors of a node (i.e. parent, ith child, or any child with some label, . . .) as well as more sophisticated path-based search operations over its labeled structure. We present a novel approach to this problem by designing what we call the XBW-transform of the tree in the spirit of the well-knownBurrows-Wheeler transform for strings [1994]. TheXBW-transform uses path-sorting to linearize the labeled tree T into two coordinated arrays, one capturing the structure and the other the labels. For the first time, by using the properties of the XBW-transform, our compressed indexes go beyond the information-theoretic lower bound, and support navigational and pathsearch operations over labeled trees within (near-)optimal time bounds and entropy-bounded space. Our XBW-transform is simple and likely to spur new results in the theory of tree compression and indexing, as well as interesting application contexts. As an example, we use the XBW-transform to design and implement a compressed index for XML documents whose compression ratio is signifi-cantly better than the one achievable by state-of-the-art tools, and its query time performance is order of magnitudes faster.

    KW - Burrows-Wheeler transform

    KW - Labeled tree compression

    KW - Labeled tree indexing

    KW - Path searching

    KW - Tree navigation

    KW - XML compression

    KW - XML indexing

    UR - http://www.scopus.com/inward/record.url?scp=71449088969&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=71449088969&partnerID=8YFLogxK

    U2 - 10.1145/1613676.1613680

    DO - 10.1145/1613676.1613680

    M3 - Article

    AN - SCOPUS:71449088969

    VL - 57

    JO - Journal of the ACM

    JF - Journal of the ACM

    SN - 0004-5411

    IS - 1

    M1 - 4

    ER -