Structuring labeled trees for optimal succinctness, and beyond

Paolo Ferragina, Fabrizio Luccio, Giovanni Manzini, Shanmugavelayutham Muthukrishnan

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Consider an ordered, static tree T on t nodes where each node has a label from alphabet set ∼. Tree T may be of arbitrary degree and of arbitrary shape. Say, we wish to support basic navigational operations such as find the parent of node u, the ith child of u, and any child of u with label α. In a seminal work over fifteen years ago, Jacobson [15] observed that pointer-based tree representations are wasteful in space and introduced the notion of succinct data structures. He studied the special case of unlabeled trees and presented a succinct data structure of 2t+o(t) bits supporting navigational operations in O(1) time. The space used is asymptotically optimal with the information-theoretic lower bound averaged over all trees. This led to a slew of results on succinct data structures for arrays, trees, strings and multisets. Still, for the fundamental problem of structuring labeled trees succinctly, few results, if any, exist even though labeled trees arise frequently in practice, e.g. in the data as in markup text (XML) or in augmented data structures. We present a novel approach to the problem of succinct manipulation of labeled trees by designing what we call the xbw transform of the tree, in the spirit of the well-known Burrows-Wheeler transform for strings, xbw transform uses path-sorting and grouping to linearize the labeled tree T into two coordinated arrays, one capturing the structure and the other the labels. Using the properties of the xbw transform, we (i) derive the first-known (near-)optimal results for succinct representation of labeled trees with O(1) time for navigation operations, (ii) optimally support the powerful subpath search operation for the first time, and (iii) introduce a notion of tree entropy and present linear time algorithms for compressing a given labeled tree up to its entropy beyond the information-theoretic lower bound averaged over all tree inputs. Our xbw transform is simple and likely to spur new results in the theory of tree compression and indexing, and may have some practical impact in XML data processing.

    Original languageEnglish (US)
    Title of host publicationProceedings - 46th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2005
    Pages184-193
    Number of pages10
    DOIs
    StatePublished - Dec 1 2005
    Event46th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2005 - Pittsburgh, PA, United States
    Duration: Oct 23 2005Oct 25 2005

    Publication series

    NameProceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS
    Volume2005
    ISSN (Print)0272-5428

    Other

    Other46th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2005
    CountryUnited States
    CityPittsburgh, PA
    Period10/23/0510/25/05

    Fingerprint

    Data structures
    Labels
    XML
    Entropy
    Sorting
    Navigation

    ASJC Scopus subject areas

    • Engineering(all)

    Cite this

    Ferragina, P., Luccio, F., Manzini, G., & Muthukrishnan, S. (2005). Structuring labeled trees for optimal succinctness, and beyond. In Proceedings - 46th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2005 (pp. 184-193). [1530713] (Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS; Vol. 2005). https://doi.org/10.1109/SFCS.2005.69

    Structuring labeled trees for optimal succinctness, and beyond. / Ferragina, Paolo; Luccio, Fabrizio; Manzini, Giovanni; Muthukrishnan, Shanmugavelayutham.

    Proceedings - 46th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2005. 2005. p. 184-193 1530713 (Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS; Vol. 2005).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Ferragina, P, Luccio, F, Manzini, G & Muthukrishnan, S 2005, Structuring labeled trees for optimal succinctness, and beyond. in Proceedings - 46th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2005., 1530713, Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS, vol. 2005, pp. 184-193, 46th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2005, Pittsburgh, PA, United States, 10/23/05. https://doi.org/10.1109/SFCS.2005.69
    Ferragina P, Luccio F, Manzini G, Muthukrishnan S. Structuring labeled trees for optimal succinctness, and beyond. In Proceedings - 46th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2005. 2005. p. 184-193. 1530713. (Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS). https://doi.org/10.1109/SFCS.2005.69
    Ferragina, Paolo ; Luccio, Fabrizio ; Manzini, Giovanni ; Muthukrishnan, Shanmugavelayutham. / Structuring labeled trees for optimal succinctness, and beyond. Proceedings - 46th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2005. 2005. pp. 184-193 (Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS).
    @inproceedings{4a0387a357a242da98e41feee1b5c518,
    title = "Structuring labeled trees for optimal succinctness, and beyond",
    abstract = "Consider an ordered, static tree T on t nodes where each node has a label from alphabet set ∼. Tree T may be of arbitrary degree and of arbitrary shape. Say, we wish to support basic navigational operations such as find the parent of node u, the ith child of u, and any child of u with label α. In a seminal work over fifteen years ago, Jacobson [15] observed that pointer-based tree representations are wasteful in space and introduced the notion of succinct data structures. He studied the special case of unlabeled trees and presented a succinct data structure of 2t+o(t) bits supporting navigational operations in O(1) time. The space used is asymptotically optimal with the information-theoretic lower bound averaged over all trees. This led to a slew of results on succinct data structures for arrays, trees, strings and multisets. Still, for the fundamental problem of structuring labeled trees succinctly, few results, if any, exist even though labeled trees arise frequently in practice, e.g. in the data as in markup text (XML) or in augmented data structures. We present a novel approach to the problem of succinct manipulation of labeled trees by designing what we call the xbw transform of the tree, in the spirit of the well-known Burrows-Wheeler transform for strings, xbw transform uses path-sorting and grouping to linearize the labeled tree T into two coordinated arrays, one capturing the structure and the other the labels. Using the properties of the xbw transform, we (i) derive the first-known (near-)optimal results for succinct representation of labeled trees with O(1) time for navigation operations, (ii) optimally support the powerful subpath search operation for the first time, and (iii) introduce a notion of tree entropy and present linear time algorithms for compressing a given labeled tree up to its entropy beyond the information-theoretic lower bound averaged over all tree inputs. Our xbw transform is simple and likely to spur new results in the theory of tree compression and indexing, and may have some practical impact in XML data processing.",
    author = "Paolo Ferragina and Fabrizio Luccio and Giovanni Manzini and Shanmugavelayutham Muthukrishnan",
    year = "2005",
    month = "12",
    day = "1",
    doi = "10.1109/SFCS.2005.69",
    language = "English (US)",
    isbn = "0769524680",
    series = "Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS",
    pages = "184--193",
    booktitle = "Proceedings - 46th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2005",

    }

    TY - GEN

    T1 - Structuring labeled trees for optimal succinctness, and beyond

    AU - Ferragina, Paolo

    AU - Luccio, Fabrizio

    AU - Manzini, Giovanni

    AU - Muthukrishnan, Shanmugavelayutham

    PY - 2005/12/1

    Y1 - 2005/12/1

    N2 - Consider an ordered, static tree T on t nodes where each node has a label from alphabet set ∼. Tree T may be of arbitrary degree and of arbitrary shape. Say, we wish to support basic navigational operations such as find the parent of node u, the ith child of u, and any child of u with label α. In a seminal work over fifteen years ago, Jacobson [15] observed that pointer-based tree representations are wasteful in space and introduced the notion of succinct data structures. He studied the special case of unlabeled trees and presented a succinct data structure of 2t+o(t) bits supporting navigational operations in O(1) time. The space used is asymptotically optimal with the information-theoretic lower bound averaged over all trees. This led to a slew of results on succinct data structures for arrays, trees, strings and multisets. Still, for the fundamental problem of structuring labeled trees succinctly, few results, if any, exist even though labeled trees arise frequently in practice, e.g. in the data as in markup text (XML) or in augmented data structures. We present a novel approach to the problem of succinct manipulation of labeled trees by designing what we call the xbw transform of the tree, in the spirit of the well-known Burrows-Wheeler transform for strings, xbw transform uses path-sorting and grouping to linearize the labeled tree T into two coordinated arrays, one capturing the structure and the other the labels. Using the properties of the xbw transform, we (i) derive the first-known (near-)optimal results for succinct representation of labeled trees with O(1) time for navigation operations, (ii) optimally support the powerful subpath search operation for the first time, and (iii) introduce a notion of tree entropy and present linear time algorithms for compressing a given labeled tree up to its entropy beyond the information-theoretic lower bound averaged over all tree inputs. Our xbw transform is simple and likely to spur new results in the theory of tree compression and indexing, and may have some practical impact in XML data processing.

    AB - Consider an ordered, static tree T on t nodes where each node has a label from alphabet set ∼. Tree T may be of arbitrary degree and of arbitrary shape. Say, we wish to support basic navigational operations such as find the parent of node u, the ith child of u, and any child of u with label α. In a seminal work over fifteen years ago, Jacobson [15] observed that pointer-based tree representations are wasteful in space and introduced the notion of succinct data structures. He studied the special case of unlabeled trees and presented a succinct data structure of 2t+o(t) bits supporting navigational operations in O(1) time. The space used is asymptotically optimal with the information-theoretic lower bound averaged over all trees. This led to a slew of results on succinct data structures for arrays, trees, strings and multisets. Still, for the fundamental problem of structuring labeled trees succinctly, few results, if any, exist even though labeled trees arise frequently in practice, e.g. in the data as in markup text (XML) or in augmented data structures. We present a novel approach to the problem of succinct manipulation of labeled trees by designing what we call the xbw transform of the tree, in the spirit of the well-known Burrows-Wheeler transform for strings, xbw transform uses path-sorting and grouping to linearize the labeled tree T into two coordinated arrays, one capturing the structure and the other the labels. Using the properties of the xbw transform, we (i) derive the first-known (near-)optimal results for succinct representation of labeled trees with O(1) time for navigation operations, (ii) optimally support the powerful subpath search operation for the first time, and (iii) introduce a notion of tree entropy and present linear time algorithms for compressing a given labeled tree up to its entropy beyond the information-theoretic lower bound averaged over all tree inputs. Our xbw transform is simple and likely to spur new results in the theory of tree compression and indexing, and may have some practical impact in XML data processing.

    UR - http://www.scopus.com/inward/record.url?scp=33748626529&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=33748626529&partnerID=8YFLogxK

    U2 - 10.1109/SFCS.2005.69

    DO - 10.1109/SFCS.2005.69

    M3 - Conference contribution

    AN - SCOPUS:33748626529

    SN - 0769524680

    SN - 9780769524689

    T3 - Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS

    SP - 184

    EP - 193

    BT - Proceedings - 46th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2005

    ER -