On the sorting-complexity of suffix tree construction

Martin Farach-Colton, Paolo Ferragina, Shanmugavelayutham Muthukrishnan

    Research output: Contribution to journalArticle

    Abstract

    The suffix tree of a string is the fundamental data structure of combinatorial pattern matching. We present a recursive technique for building suffix trees that yields optimal algorithms in different computational models. Sorting is an inherent bottleneck in building suffix trees and our algorithms match the sorting lower bound. Specifically, we present the following results. (1) Weiner [1973], who introduced the data structure, gave an optimal O(n)-time algorithm for building the suffix tree of an n-character string drawn from a constant-size alphabet. In the comparison model, there is a trivial Ω(n log n)-time lower bound based on sorting, and Weiner's algorithm matches this bound. For integer alphabets, the fastest known algorithm is the O(n log n) time comparison-based algorithm, but no super-linear lower bound is known. Closing this gap.

    Original languageEnglish (US)
    Pages (from-to)987-1011
    Number of pages25
    JournalJournal of the ACM
    Volume47
    Issue number6
    DOIs
    StatePublished - Jan 1 2000

    Fingerprint

    Sorting
    Data structures
    Pattern matching

    Keywords

    • Dam model
    • External-memory data structures
    • Ram model
    • Sorting complexity
    • Suffix array
    • Suffix tree

    ASJC Scopus subject areas

    • Software
    • Control and Systems Engineering
    • Information Systems
    • Hardware and Architecture
    • Artificial Intelligence

    Cite this

    Farach-Colton, M., Ferragina, P., & Muthukrishnan, S. (2000). On the sorting-complexity of suffix tree construction. Journal of the ACM, 47(6), 987-1011. https://doi.org/10.1145/355541.355547

    On the sorting-complexity of suffix tree construction. / Farach-Colton, Martin; Ferragina, Paolo; Muthukrishnan, Shanmugavelayutham.

    In: Journal of the ACM, Vol. 47, No. 6, 01.01.2000, p. 987-1011.

    Research output: Contribution to journalArticle

    Farach-Colton, M, Ferragina, P & Muthukrishnan, S 2000, 'On the sorting-complexity of suffix tree construction', Journal of the ACM, vol. 47, no. 6, pp. 987-1011. https://doi.org/10.1145/355541.355547
    Farach-Colton M, Ferragina P, Muthukrishnan S. On the sorting-complexity of suffix tree construction. Journal of the ACM. 2000 Jan 1;47(6):987-1011. https://doi.org/10.1145/355541.355547
    Farach-Colton, Martin ; Ferragina, Paolo ; Muthukrishnan, Shanmugavelayutham. / On the sorting-complexity of suffix tree construction. In: Journal of the ACM. 2000 ; Vol. 47, No. 6. pp. 987-1011.
    @article{79cc2816c1b14ccab4717ab09b705592,
    title = "On the sorting-complexity of suffix tree construction",
    abstract = "The suffix tree of a string is the fundamental data structure of combinatorial pattern matching. We present a recursive technique for building suffix trees that yields optimal algorithms in different computational models. Sorting is an inherent bottleneck in building suffix trees and our algorithms match the sorting lower bound. Specifically, we present the following results. (1) Weiner [1973], who introduced the data structure, gave an optimal O(n)-time algorithm for building the suffix tree of an n-character string drawn from a constant-size alphabet. In the comparison model, there is a trivial Ω(n log n)-time lower bound based on sorting, and Weiner's algorithm matches this bound. For integer alphabets, the fastest known algorithm is the O(n log n) time comparison-based algorithm, but no super-linear lower bound is known. Closing this gap.",
    keywords = "Dam model, External-memory data structures, Ram model, Sorting complexity, Suffix array, Suffix tree",
    author = "Martin Farach-Colton and Paolo Ferragina and Shanmugavelayutham Muthukrishnan",
    year = "2000",
    month = "1",
    day = "1",
    doi = "10.1145/355541.355547",
    language = "English (US)",
    volume = "47",
    pages = "987--1011",
    journal = "Journal of the ACM",
    issn = "0004-5411",
    publisher = "Association for Computing Machinery (ACM)",
    number = "6",

    }

    TY - JOUR

    T1 - On the sorting-complexity of suffix tree construction

    AU - Farach-Colton, Martin

    AU - Ferragina, Paolo

    AU - Muthukrishnan, Shanmugavelayutham

    PY - 2000/1/1

    Y1 - 2000/1/1

    N2 - The suffix tree of a string is the fundamental data structure of combinatorial pattern matching. We present a recursive technique for building suffix trees that yields optimal algorithms in different computational models. Sorting is an inherent bottleneck in building suffix trees and our algorithms match the sorting lower bound. Specifically, we present the following results. (1) Weiner [1973], who introduced the data structure, gave an optimal O(n)-time algorithm for building the suffix tree of an n-character string drawn from a constant-size alphabet. In the comparison model, there is a trivial Ω(n log n)-time lower bound based on sorting, and Weiner's algorithm matches this bound. For integer alphabets, the fastest known algorithm is the O(n log n) time comparison-based algorithm, but no super-linear lower bound is known. Closing this gap.

    AB - The suffix tree of a string is the fundamental data structure of combinatorial pattern matching. We present a recursive technique for building suffix trees that yields optimal algorithms in different computational models. Sorting is an inherent bottleneck in building suffix trees and our algorithms match the sorting lower bound. Specifically, we present the following results. (1) Weiner [1973], who introduced the data structure, gave an optimal O(n)-time algorithm for building the suffix tree of an n-character string drawn from a constant-size alphabet. In the comparison model, there is a trivial Ω(n log n)-time lower bound based on sorting, and Weiner's algorithm matches this bound. For integer alphabets, the fastest known algorithm is the O(n log n) time comparison-based algorithm, but no super-linear lower bound is known. Closing this gap.

    KW - Dam model

    KW - External-memory data structures

    KW - Ram model

    KW - Sorting complexity

    KW - Suffix array

    KW - Suffix tree

    UR - http://www.scopus.com/inward/record.url?scp=0037624231&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=0037624231&partnerID=8YFLogxK

    U2 - 10.1145/355541.355547

    DO - 10.1145/355541.355547

    M3 - Article

    AN - SCOPUS:0037624231

    VL - 47

    SP - 987

    EP - 1011

    JO - Journal of the ACM

    JF - Journal of the ACM

    SN - 0004-5411

    IS - 6

    ER -