Optimal and approximate computation of summary statistics for range aggregates

A. C. Gilbert, Y. Kotidis, Shanmugavelayutham Muthukrishnan, M. J. Strauss

    Research output: Contribution to conferencePaper

    Abstract

    Fast estimates for aggregate queries are useful in database query optimization, approximate query answering and online query processing. Hence, there has been a lot of focus on "selectivity estimation", that is, computing summary statistics on the underlying data and using that to answer aggregate queries fast and to a reasonable approximation. We present two sets of results for range aggregate queries, which are amongst the most common queries. First, we focus on a histogram as summary statistics and present algorithms for constructing histograms that are provably optimal (or provably approximate) for range queries; these algorithms take (pseudo-) polynomial time. These are the first known optimality or approximation results for arbitrary range queries; previously known results were optimal only for restricted range queries (such as equality queries, hierarchical or prefix range queries). Second, we focus on wavelet-based representations as summary statistics and present fast algorithms for pi cking wavelet statistics that are provably optimal for range queries. No previously-known wavelet-based methods have this property. We perform an experimental study of the various summary representations show the benefits of our algorithms over the known methods.

    Original languageEnglish (US)
    Pages227-236
    Number of pages10
    StatePublished - Jan 1 2001
    Event20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems - Santa Barbara, CA, United States
    Duration: May 21 2001May 23 2001

    Conference

    Conference20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems
    CountryUnited States
    CitySanta Barbara, CA
    Period5/21/015/23/01

    Fingerprint

    Statistics
    Query processing
    Polynomials

    ASJC Scopus subject areas

    • Software
    • Information Systems
    • Hardware and Architecture

    Cite this

    Gilbert, A. C., Kotidis, Y., Muthukrishnan, S., & Strauss, M. J. (2001). Optimal and approximate computation of summary statistics for range aggregates. 227-236. Paper presented at 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Santa Barbara, CA, United States.

    Optimal and approximate computation of summary statistics for range aggregates. / Gilbert, A. C.; Kotidis, Y.; Muthukrishnan, Shanmugavelayutham; Strauss, M. J.

    2001. 227-236 Paper presented at 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Santa Barbara, CA, United States.

    Research output: Contribution to conferencePaper

    Gilbert, AC, Kotidis, Y, Muthukrishnan, S & Strauss, MJ 2001, 'Optimal and approximate computation of summary statistics for range aggregates', Paper presented at 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Santa Barbara, CA, United States, 5/21/01 - 5/23/01 pp. 227-236.
    Gilbert AC, Kotidis Y, Muthukrishnan S, Strauss MJ. Optimal and approximate computation of summary statistics for range aggregates. 2001. Paper presented at 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Santa Barbara, CA, United States.
    Gilbert, A. C. ; Kotidis, Y. ; Muthukrishnan, Shanmugavelayutham ; Strauss, M. J. / Optimal and approximate computation of summary statistics for range aggregates. Paper presented at 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, Santa Barbara, CA, United States.10 p.
    @conference{b5df980b8ba74d488bbe9a880642d9ba,
    title = "Optimal and approximate computation of summary statistics for range aggregates",
    abstract = "Fast estimates for aggregate queries are useful in database query optimization, approximate query answering and online query processing. Hence, there has been a lot of focus on {"}selectivity estimation{"}, that is, computing summary statistics on the underlying data and using that to answer aggregate queries fast and to a reasonable approximation. We present two sets of results for range aggregate queries, which are amongst the most common queries. First, we focus on a histogram as summary statistics and present algorithms for constructing histograms that are provably optimal (or provably approximate) for range queries; these algorithms take (pseudo-) polynomial time. These are the first known optimality or approximation results for arbitrary range queries; previously known results were optimal only for restricted range queries (such as equality queries, hierarchical or prefix range queries). Second, we focus on wavelet-based representations as summary statistics and present fast algorithms for pi cking wavelet statistics that are provably optimal for range queries. No previously-known wavelet-based methods have this property. We perform an experimental study of the various summary representations show the benefits of our algorithms over the known methods.",
    author = "Gilbert, {A. C.} and Y. Kotidis and Shanmugavelayutham Muthukrishnan and Strauss, {M. J.}",
    year = "2001",
    month = "1",
    day = "1",
    language = "English (US)",
    pages = "227--236",
    note = "20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems ; Conference date: 21-05-2001 Through 23-05-2001",

    }

    TY - CONF

    T1 - Optimal and approximate computation of summary statistics for range aggregates

    AU - Gilbert, A. C.

    AU - Kotidis, Y.

    AU - Muthukrishnan, Shanmugavelayutham

    AU - Strauss, M. J.

    PY - 2001/1/1

    Y1 - 2001/1/1

    N2 - Fast estimates for aggregate queries are useful in database query optimization, approximate query answering and online query processing. Hence, there has been a lot of focus on "selectivity estimation", that is, computing summary statistics on the underlying data and using that to answer aggregate queries fast and to a reasonable approximation. We present two sets of results for range aggregate queries, which are amongst the most common queries. First, we focus on a histogram as summary statistics and present algorithms for constructing histograms that are provably optimal (or provably approximate) for range queries; these algorithms take (pseudo-) polynomial time. These are the first known optimality or approximation results for arbitrary range queries; previously known results were optimal only for restricted range queries (such as equality queries, hierarchical or prefix range queries). Second, we focus on wavelet-based representations as summary statistics and present fast algorithms for pi cking wavelet statistics that are provably optimal for range queries. No previously-known wavelet-based methods have this property. We perform an experimental study of the various summary representations show the benefits of our algorithms over the known methods.

    AB - Fast estimates for aggregate queries are useful in database query optimization, approximate query answering and online query processing. Hence, there has been a lot of focus on "selectivity estimation", that is, computing summary statistics on the underlying data and using that to answer aggregate queries fast and to a reasonable approximation. We present two sets of results for range aggregate queries, which are amongst the most common queries. First, we focus on a histogram as summary statistics and present algorithms for constructing histograms that are provably optimal (or provably approximate) for range queries; these algorithms take (pseudo-) polynomial time. These are the first known optimality or approximation results for arbitrary range queries; previously known results were optimal only for restricted range queries (such as equality queries, hierarchical or prefix range queries). Second, we focus on wavelet-based representations as summary statistics and present fast algorithms for pi cking wavelet statistics that are provably optimal for range queries. No previously-known wavelet-based methods have this property. We perform an experimental study of the various summary representations show the benefits of our algorithms over the known methods.

    UR - http://www.scopus.com/inward/record.url?scp=0034819287&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=0034819287&partnerID=8YFLogxK

    M3 - Paper

    AN - SCOPUS:0034819287

    SP - 227

    EP - 236

    ER -