Two-dimensional substring indexing

P. Ferragina, N. Koudas, Shanmugavelayutham Muthukrishnan, D. Srivastava

    Research output: Contribution to conferencePaper

    Abstract

    As databases have expanded in scope to storing string data (XML documents, product catalogs), it has become increasingly important to search databases based on matching substrings, often on multiple, correlated dimensions. While string B-trees are I/O optimal in one dimension, no index structure with non-trivial query bounds is known for two-dimensional substring indexing. In this paper, we present a technique for two-dimensional substring indexing based on a reduction to the geometric problem of identifying common colors in two ranges containing colored points. We develop an I/O efficient algorithm for solving the common colors problem, and use it to obtain an I/O efficient (poly-logarithmic query time) algorithm for the two-dimensional substring indexing problem. Our techniques result in a family of secondary memory index structures that trade space for time, with no loss of accuracy. We show how our technique can be practically realized using a combination of string B-trees and R-trees.

    Original languageEnglish (US)
    Pages282-288
    Number of pages7
    StatePublished - Jan 1 2001
    Event20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems - Santa Barbara, CA, United States
    Duration: May 21 2001May 23 2001

    Conference

    Conference20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems
    CountryUnited States
    CitySanta Barbara, CA
    Period5/21/015/23/01

    Fingerprint

    Color
    XML
    Data storage equipment

    ASJC Scopus subject areas

    • Software
    • Information Systems
    • Hardware and Architecture

    Cite this

    Ferragina, P., Koudas, N., Muthukrishnan, S., & Srivastava, D. (2001). Two-dimensional substring indexing. 282-288. Paper presented at 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Santa Barbara, CA, United States.

    Two-dimensional substring indexing. / Ferragina, P.; Koudas, N.; Muthukrishnan, Shanmugavelayutham; Srivastava, D.

    2001. 282-288 Paper presented at 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Santa Barbara, CA, United States.

    Research output: Contribution to conferencePaper

    Ferragina, P, Koudas, N, Muthukrishnan, S & Srivastava, D 2001, 'Two-dimensional substring indexing' Paper presented at 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Santa Barbara, CA, United States, 5/21/01 - 5/23/01, pp. 282-288.
    Ferragina P, Koudas N, Muthukrishnan S, Srivastava D. Two-dimensional substring indexing. 2001. Paper presented at 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Santa Barbara, CA, United States.
    Ferragina, P. ; Koudas, N. ; Muthukrishnan, Shanmugavelayutham ; Srivastava, D. / Two-dimensional substring indexing. Paper presented at 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Santa Barbara, CA, United States.7 p.
    @conference{d7888e883a434346a83c758e695c6d29,
    title = "Two-dimensional substring indexing",
    abstract = "As databases have expanded in scope to storing string data (XML documents, product catalogs), it has become increasingly important to search databases based on matching substrings, often on multiple, correlated dimensions. While string B-trees are I/O optimal in one dimension, no index structure with non-trivial query bounds is known for two-dimensional substring indexing. In this paper, we present a technique for two-dimensional substring indexing based on a reduction to the geometric problem of identifying common colors in two ranges containing colored points. We develop an I/O efficient algorithm for solving the common colors problem, and use it to obtain an I/O efficient (poly-logarithmic query time) algorithm for the two-dimensional substring indexing problem. Our techniques result in a family of secondary memory index structures that trade space for time, with no loss of accuracy. We show how our technique can be practically realized using a combination of string B-trees and R-trees.",
    author = "P. Ferragina and N. Koudas and Shanmugavelayutham Muthukrishnan and D. Srivastava",
    year = "2001",
    month = "1",
    day = "1",
    language = "English (US)",
    pages = "282--288",
    note = "20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems ; Conference date: 21-05-2001 Through 23-05-2001",

    }

    TY - CONF

    T1 - Two-dimensional substring indexing

    AU - Ferragina, P.

    AU - Koudas, N.

    AU - Muthukrishnan, Shanmugavelayutham

    AU - Srivastava, D.

    PY - 2001/1/1

    Y1 - 2001/1/1

    N2 - As databases have expanded in scope to storing string data (XML documents, product catalogs), it has become increasingly important to search databases based on matching substrings, often on multiple, correlated dimensions. While string B-trees are I/O optimal in one dimension, no index structure with non-trivial query bounds is known for two-dimensional substring indexing. In this paper, we present a technique for two-dimensional substring indexing based on a reduction to the geometric problem of identifying common colors in two ranges containing colored points. We develop an I/O efficient algorithm for solving the common colors problem, and use it to obtain an I/O efficient (poly-logarithmic query time) algorithm for the two-dimensional substring indexing problem. Our techniques result in a family of secondary memory index structures that trade space for time, with no loss of accuracy. We show how our technique can be practically realized using a combination of string B-trees and R-trees.

    AB - As databases have expanded in scope to storing string data (XML documents, product catalogs), it has become increasingly important to search databases based on matching substrings, often on multiple, correlated dimensions. While string B-trees are I/O optimal in one dimension, no index structure with non-trivial query bounds is known for two-dimensional substring indexing. In this paper, we present a technique for two-dimensional substring indexing based on a reduction to the geometric problem of identifying common colors in two ranges containing colored points. We develop an I/O efficient algorithm for solving the common colors problem, and use it to obtain an I/O efficient (poly-logarithmic query time) algorithm for the two-dimensional substring indexing problem. Our techniques result in a family of secondary memory index structures that trade space for time, with no loss of accuracy. We show how our technique can be practically realized using a combination of string B-trees and R-trees.

    UR - http://www.scopus.com/inward/record.url?scp=0034819893&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=0034819893&partnerID=8YFLogxK

    M3 - Paper

    AN - SCOPUS:0034819893

    SP - 282

    EP - 288

    ER -