Two-dimensional substring indexing

Paolo Ferragina, Nick Koudas, Shanmugavelayutham Muthukrishnan, Divesh Srivastava

    Research output: Contribution to journalArticle

    Abstract

    As databases have expanded in scope to storing string data (XML documents, product catalogs), it has become increasingly important to search databases based on matching substrings, often on multiple, correlated dimensions. While string B-trees are I/O optimal in one dimension, no index structure with non-trivial query bounds is known for two-dimensional substring indexing. In this paper, we present a technique for two-dimensional substring indexing based on a reduction to the geometric problem of identifying common colors in two ranges containing colored points. We develop an I/O efficient algorithm for solving the common colors problem, and use it to obtain an I/O efficient (poly-logarithmic query time) algorithm for the two-dimensional substring indexing problem. Our techniques result in a family of secondary memory index structures that trade space for time, with no loss of accuracy.

    Original languageEnglish (US)
    Pages (from-to)763-774
    Number of pages12
    JournalJournal of Computer and System Sciences
    Volume66
    Issue number4
    DOIs
    StatePublished - Jan 1 2003

    Fingerprint

    Indexing
    Strings
    Query
    B-tree
    Color
    XML
    One Dimension
    Logarithmic
    Efficient Algorithms
    Data storage equipment
    Range of data
    Family
    Trade

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • Computer Networks and Communications
    • Computational Theory and Mathematics
    • Applied Mathematics

    Cite this

    Ferragina, P., Koudas, N., Muthukrishnan, S., & Srivastava, D. (2003). Two-dimensional substring indexing. Journal of Computer and System Sciences, 66(4), 763-774. https://doi.org/10.1016/S0022-0000(03)00028-X

    Two-dimensional substring indexing. / Ferragina, Paolo; Koudas, Nick; Muthukrishnan, Shanmugavelayutham; Srivastava, Divesh.

    In: Journal of Computer and System Sciences, Vol. 66, No. 4, 01.01.2003, p. 763-774.

    Research output: Contribution to journalArticle

    Ferragina, P, Koudas, N, Muthukrishnan, S & Srivastava, D 2003, 'Two-dimensional substring indexing', Journal of Computer and System Sciences, vol. 66, no. 4, pp. 763-774. https://doi.org/10.1016/S0022-0000(03)00028-X
    Ferragina P, Koudas N, Muthukrishnan S, Srivastava D. Two-dimensional substring indexing. Journal of Computer and System Sciences. 2003 Jan 1;66(4):763-774. https://doi.org/10.1016/S0022-0000(03)00028-X
    Ferragina, Paolo ; Koudas, Nick ; Muthukrishnan, Shanmugavelayutham ; Srivastava, Divesh. / Two-dimensional substring indexing. In: Journal of Computer and System Sciences. 2003 ; Vol. 66, No. 4. pp. 763-774.
    @article{c82b562041224c56a9081cf4d0b47831,
    title = "Two-dimensional substring indexing",
    abstract = "As databases have expanded in scope to storing string data (XML documents, product catalogs), it has become increasingly important to search databases based on matching substrings, often on multiple, correlated dimensions. While string B-trees are I/O optimal in one dimension, no index structure with non-trivial query bounds is known for two-dimensional substring indexing. In this paper, we present a technique for two-dimensional substring indexing based on a reduction to the geometric problem of identifying common colors in two ranges containing colored points. We develop an I/O efficient algorithm for solving the common colors problem, and use it to obtain an I/O efficient (poly-logarithmic query time) algorithm for the two-dimensional substring indexing problem. Our techniques result in a family of secondary memory index structures that trade space for time, with no loss of accuracy.",
    author = "Paolo Ferragina and Nick Koudas and Shanmugavelayutham Muthukrishnan and Divesh Srivastava",
    year = "2003",
    month = "1",
    day = "1",
    doi = "10.1016/S0022-0000(03)00028-X",
    language = "English (US)",
    volume = "66",
    pages = "763--774",
    journal = "Journal of Computer and System Sciences",
    issn = "0022-0000",
    publisher = "Academic Press Inc.",
    number = "4",

    }

    TY - JOUR

    T1 - Two-dimensional substring indexing

    AU - Ferragina, Paolo

    AU - Koudas, Nick

    AU - Muthukrishnan, Shanmugavelayutham

    AU - Srivastava, Divesh

    PY - 2003/1/1

    Y1 - 2003/1/1

    N2 - As databases have expanded in scope to storing string data (XML documents, product catalogs), it has become increasingly important to search databases based on matching substrings, often on multiple, correlated dimensions. While string B-trees are I/O optimal in one dimension, no index structure with non-trivial query bounds is known for two-dimensional substring indexing. In this paper, we present a technique for two-dimensional substring indexing based on a reduction to the geometric problem of identifying common colors in two ranges containing colored points. We develop an I/O efficient algorithm for solving the common colors problem, and use it to obtain an I/O efficient (poly-logarithmic query time) algorithm for the two-dimensional substring indexing problem. Our techniques result in a family of secondary memory index structures that trade space for time, with no loss of accuracy.

    AB - As databases have expanded in scope to storing string data (XML documents, product catalogs), it has become increasingly important to search databases based on matching substrings, often on multiple, correlated dimensions. While string B-trees are I/O optimal in one dimension, no index structure with non-trivial query bounds is known for two-dimensional substring indexing. In this paper, we present a technique for two-dimensional substring indexing based on a reduction to the geometric problem of identifying common colors in two ranges containing colored points. We develop an I/O efficient algorithm for solving the common colors problem, and use it to obtain an I/O efficient (poly-logarithmic query time) algorithm for the two-dimensional substring indexing problem. Our techniques result in a family of secondary memory index structures that trade space for time, with no loss of accuracy.

    UR - http://www.scopus.com/inward/record.url?scp=0037490904&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=0037490904&partnerID=8YFLogxK

    U2 - 10.1016/S0022-0000(03)00028-X

    DO - 10.1016/S0022-0000(03)00028-X

    M3 - Article

    VL - 66

    SP - 763

    EP - 774

    JO - Journal of Computer and System Sciences

    JF - Journal of Computer and System Sciences

    SN - 0022-0000

    IS - 4

    ER -