Space- and time-efficient deterministic algorithms for biased quantiles over data streams

Graham Cormode, Flip Korn, Shanmugavelayutham Muthukrishnan, Divesh Srivastava

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Skew is prevalent in data streams, and should be taken into account by algorithms that analyze the data. The problem of finding "biased quantiles"-that is, approximate quantiles which must be more accurate for more extreme values-is a framework for summarizing such skewed data on data streams. We present the first deterministic algorithms for answering biased quantiles queries accurately with small-sublinear in the input size-space and time bounds in one pass. The space bound is near-optimal, and the amortized update cost is close to constant, making it practical for handling high speed network data streams. We not only demonstrate theoretical properties of the algorithm, but also show it uses less space than existing methods in many practical settings, and is fast to maintain.

    Original languageEnglish (US)
    Title of host publicationProceedings of the Twenty-Fifth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2006
    Pages263-272
    Number of pages10
    DOIs
    StatePublished - Dec 1 2006
    Event25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2006 - Chicago, IL, United States
    Duration: Jun 26 2006Jun 28 2006

    Publication series

    NameProceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems

    Other

    Other25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2006
    CountryUnited States
    CityChicago, IL
    Period6/26/066/28/06

    Fingerprint

    HIgh speed networks
    Costs

    Keywords

    • Biased quantiles
    • Data stream algorithms

    ASJC Scopus subject areas

    • Software
    • Information Systems
    • Hardware and Architecture

    Cite this

    Cormode, G., Korn, F., Muthukrishnan, S., & Srivastava, D. (2006). Space- and time-efficient deterministic algorithms for biased quantiles over data streams. In Proceedings of the Twenty-Fifth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2006 (pp. 263-272). (Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems). https://doi.org/10.1145/1142351.1142389

    Space- and time-efficient deterministic algorithms for biased quantiles over data streams. / Cormode, Graham; Korn, Flip; Muthukrishnan, Shanmugavelayutham; Srivastava, Divesh.

    Proceedings of the Twenty-Fifth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2006. 2006. p. 263-272 (Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Cormode, G, Korn, F, Muthukrishnan, S & Srivastava, D 2006, Space- and time-efficient deterministic algorithms for biased quantiles over data streams. in Proceedings of the Twenty-Fifth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2006. Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 263-272, 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2006, Chicago, IL, United States, 6/26/06. https://doi.org/10.1145/1142351.1142389
    Cormode G, Korn F, Muthukrishnan S, Srivastava D. Space- and time-efficient deterministic algorithms for biased quantiles over data streams. In Proceedings of the Twenty-Fifth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2006. 2006. p. 263-272. (Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems). https://doi.org/10.1145/1142351.1142389
    Cormode, Graham ; Korn, Flip ; Muthukrishnan, Shanmugavelayutham ; Srivastava, Divesh. / Space- and time-efficient deterministic algorithms for biased quantiles over data streams. Proceedings of the Twenty-Fifth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2006. 2006. pp. 263-272 (Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems).
    @inproceedings{ad667e2072a1411ea2dc2d8851d7862c,
    title = "Space- and time-efficient deterministic algorithms for biased quantiles over data streams",
    abstract = "Skew is prevalent in data streams, and should be taken into account by algorithms that analyze the data. The problem of finding {"}biased quantiles{"}-that is, approximate quantiles which must be more accurate for more extreme values-is a framework for summarizing such skewed data on data streams. We present the first deterministic algorithms for answering biased quantiles queries accurately with small-sublinear in the input size-space and time bounds in one pass. The space bound is near-optimal, and the amortized update cost is close to constant, making it practical for handling high speed network data streams. We not only demonstrate theoretical properties of the algorithm, but also show it uses less space than existing methods in many practical settings, and is fast to maintain.",
    keywords = "Biased quantiles, Data stream algorithms",
    author = "Graham Cormode and Flip Korn and Shanmugavelayutham Muthukrishnan and Divesh Srivastava",
    year = "2006",
    month = "12",
    day = "1",
    doi = "10.1145/1142351.1142389",
    language = "English (US)",
    isbn = "1595933182",
    series = "Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems",
    pages = "263--272",
    booktitle = "Proceedings of the Twenty-Fifth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2006",

    }

    TY - GEN

    T1 - Space- and time-efficient deterministic algorithms for biased quantiles over data streams

    AU - Cormode, Graham

    AU - Korn, Flip

    AU - Muthukrishnan, Shanmugavelayutham

    AU - Srivastava, Divesh

    PY - 2006/12/1

    Y1 - 2006/12/1

    N2 - Skew is prevalent in data streams, and should be taken into account by algorithms that analyze the data. The problem of finding "biased quantiles"-that is, approximate quantiles which must be more accurate for more extreme values-is a framework for summarizing such skewed data on data streams. We present the first deterministic algorithms for answering biased quantiles queries accurately with small-sublinear in the input size-space and time bounds in one pass. The space bound is near-optimal, and the amortized update cost is close to constant, making it practical for handling high speed network data streams. We not only demonstrate theoretical properties of the algorithm, but also show it uses less space than existing methods in many practical settings, and is fast to maintain.

    AB - Skew is prevalent in data streams, and should be taken into account by algorithms that analyze the data. The problem of finding "biased quantiles"-that is, approximate quantiles which must be more accurate for more extreme values-is a framework for summarizing such skewed data on data streams. We present the first deterministic algorithms for answering biased quantiles queries accurately with small-sublinear in the input size-space and time bounds in one pass. The space bound is near-optimal, and the amortized update cost is close to constant, making it practical for handling high speed network data streams. We not only demonstrate theoretical properties of the algorithm, but also show it uses less space than existing methods in many practical settings, and is fast to maintain.

    KW - Biased quantiles

    KW - Data stream algorithms

    UR - http://www.scopus.com/inward/record.url?scp=33846219851&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=33846219851&partnerID=8YFLogxK

    U2 - 10.1145/1142351.1142389

    DO - 10.1145/1142351.1142389

    M3 - Conference contribution

    AN - SCOPUS:33846219851

    SN - 1595933182

    SN - 9781595933188

    T3 - Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems

    SP - 263

    EP - 272

    BT - Proceedings of the Twenty-Fifth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2006

    ER -