Frugal streaming for estimating quantiles

Qiang Ma, Shanmugavelayutham Muthukrishnan, Mark Sandler

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Modern applications require processing streams of data for estimating statistical quantities such as quantiles with small amount of memory. In many such applications, in fact, one needs to compute such statistical quantities for each of a large number of groups (e.g.,network traffic grouped by source IP address), which additionally restricts the amount of memory available for the stream for any particular group. We address this challenge and introduce frugal streaming, that is algorithms that work with tiny - typically, sub-streaming - amount of memory per group. We design a frugal algorithm that uses only one unit of memory per group to compute a quantile for each group. For stochastic streams where data items are drawn from a distribution independently, we analyze and show that the algorithm finds an approximation to the quantile rapidly and remains stably close to it. We also propose an extension of this algorithm that uses two units of memory per group. We show experiments with real world data from HTTP trace and Twitter that our frugal algorithms are comparable to existing streaming algorithms for estimating any quantile, but these existing algorithms use far more space per group and are unrealistic in frugal applications; further, the two memory frugal algorithm converges significantly faster than the one memory algorithm.

    Original languageEnglish (US)
    Title of host publicationSpace-Efficient Data Structures, Streams, and Algorithms - Papers in Honor of J. Ian Munro on the Occasion of His 66th Birthday
    PublisherSpringer-Verlag
    Pages77-96
    Number of pages20
    ISBN (Print)9783642402722
    DOIs
    StatePublished - Jan 1 2013
    EventConference on Space-Efficient Data Structures, Streams, and Algorithms - Waterloo, ON, Canada
    Duration: Aug 15 2013Aug 16 2013

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume8066 LNCS
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Other

    OtherConference on Space-Efficient Data Structures, Streams, and Algorithms
    CountryCanada
    CityWaterloo, ON
    Period8/15/138/16/13

    Fingerprint

    Quantile
    Streaming
    Data storage equipment
    Stream Processing
    HTTP
    Unit
    Network Traffic
    Trace
    Converge
    Approximation
    Processing

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • Computer Science(all)

    Cite this

    Ma, Q., Muthukrishnan, S., & Sandler, M. (2013). Frugal streaming for estimating quantiles. In Space-Efficient Data Structures, Streams, and Algorithms - Papers in Honor of J. Ian Munro on the Occasion of His 66th Birthday (pp. 77-96). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8066 LNCS). Springer-Verlag. https://doi.org/10.1007/978-3-642-40273-9_7

    Frugal streaming for estimating quantiles. / Ma, Qiang; Muthukrishnan, Shanmugavelayutham; Sandler, Mark.

    Space-Efficient Data Structures, Streams, and Algorithms - Papers in Honor of J. Ian Munro on the Occasion of His 66th Birthday. Springer-Verlag, 2013. p. 77-96 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 8066 LNCS).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Ma, Q, Muthukrishnan, S & Sandler, M 2013, Frugal streaming for estimating quantiles. in Space-Efficient Data Structures, Streams, and Algorithms - Papers in Honor of J. Ian Munro on the Occasion of His 66th Birthday. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8066 LNCS, Springer-Verlag, pp. 77-96, Conference on Space-Efficient Data Structures, Streams, and Algorithms, Waterloo, ON, Canada, 8/15/13. https://doi.org/10.1007/978-3-642-40273-9_7
    Ma Q, Muthukrishnan S, Sandler M. Frugal streaming for estimating quantiles. In Space-Efficient Data Structures, Streams, and Algorithms - Papers in Honor of J. Ian Munro on the Occasion of His 66th Birthday. Springer-Verlag. 2013. p. 77-96. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-40273-9_7
    Ma, Qiang ; Muthukrishnan, Shanmugavelayutham ; Sandler, Mark. / Frugal streaming for estimating quantiles. Space-Efficient Data Structures, Streams, and Algorithms - Papers in Honor of J. Ian Munro on the Occasion of His 66th Birthday. Springer-Verlag, 2013. pp. 77-96 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
    @inproceedings{8385dda75a8344b0be7e2b0c2bacaccb,
    title = "Frugal streaming for estimating quantiles",
    abstract = "Modern applications require processing streams of data for estimating statistical quantities such as quantiles with small amount of memory. In many such applications, in fact, one needs to compute such statistical quantities for each of a large number of groups (e.g.,network traffic grouped by source IP address), which additionally restricts the amount of memory available for the stream for any particular group. We address this challenge and introduce frugal streaming, that is algorithms that work with tiny - typically, sub-streaming - amount of memory per group. We design a frugal algorithm that uses only one unit of memory per group to compute a quantile for each group. For stochastic streams where data items are drawn from a distribution independently, we analyze and show that the algorithm finds an approximation to the quantile rapidly and remains stably close to it. We also propose an extension of this algorithm that uses two units of memory per group. We show experiments with real world data from HTTP trace and Twitter that our frugal algorithms are comparable to existing streaming algorithms for estimating any quantile, but these existing algorithms use far more space per group and are unrealistic in frugal applications; further, the two memory frugal algorithm converges significantly faster than the one memory algorithm.",
    author = "Qiang Ma and Shanmugavelayutham Muthukrishnan and Mark Sandler",
    year = "2013",
    month = "1",
    day = "1",
    doi = "10.1007/978-3-642-40273-9_7",
    language = "English (US)",
    isbn = "9783642402722",
    series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
    publisher = "Springer-Verlag",
    pages = "77--96",
    booktitle = "Space-Efficient Data Structures, Streams, and Algorithms - Papers in Honor of J. Ian Munro on the Occasion of His 66th Birthday",

    }

    TY - GEN

    T1 - Frugal streaming for estimating quantiles

    AU - Ma, Qiang

    AU - Muthukrishnan, Shanmugavelayutham

    AU - Sandler, Mark

    PY - 2013/1/1

    Y1 - 2013/1/1

    N2 - Modern applications require processing streams of data for estimating statistical quantities such as quantiles with small amount of memory. In many such applications, in fact, one needs to compute such statistical quantities for each of a large number of groups (e.g.,network traffic grouped by source IP address), which additionally restricts the amount of memory available for the stream for any particular group. We address this challenge and introduce frugal streaming, that is algorithms that work with tiny - typically, sub-streaming - amount of memory per group. We design a frugal algorithm that uses only one unit of memory per group to compute a quantile for each group. For stochastic streams where data items are drawn from a distribution independently, we analyze and show that the algorithm finds an approximation to the quantile rapidly and remains stably close to it. We also propose an extension of this algorithm that uses two units of memory per group. We show experiments with real world data from HTTP trace and Twitter that our frugal algorithms are comparable to existing streaming algorithms for estimating any quantile, but these existing algorithms use far more space per group and are unrealistic in frugal applications; further, the two memory frugal algorithm converges significantly faster than the one memory algorithm.

    AB - Modern applications require processing streams of data for estimating statistical quantities such as quantiles with small amount of memory. In many such applications, in fact, one needs to compute such statistical quantities for each of a large number of groups (e.g.,network traffic grouped by source IP address), which additionally restricts the amount of memory available for the stream for any particular group. We address this challenge and introduce frugal streaming, that is algorithms that work with tiny - typically, sub-streaming - amount of memory per group. We design a frugal algorithm that uses only one unit of memory per group to compute a quantile for each group. For stochastic streams where data items are drawn from a distribution independently, we analyze and show that the algorithm finds an approximation to the quantile rapidly and remains stably close to it. We also propose an extension of this algorithm that uses two units of memory per group. We show experiments with real world data from HTTP trace and Twitter that our frugal algorithms are comparable to existing streaming algorithms for estimating any quantile, but these existing algorithms use far more space per group and are unrealistic in frugal applications; further, the two memory frugal algorithm converges significantly faster than the one memory algorithm.

    UR - http://www.scopus.com/inward/record.url?scp=84894110628&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84894110628&partnerID=8YFLogxK

    U2 - 10.1007/978-3-642-40273-9_7

    DO - 10.1007/978-3-642-40273-9_7

    M3 - Conference contribution

    AN - SCOPUS:84894110628

    SN - 9783642402722

    T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    SP - 77

    EP - 96

    BT - Space-Efficient Data Structures, Streams, and Algorithms - Papers in Honor of J. Ian Munro on the Occasion of His 66th Birthday

    PB - Springer-Verlag

    ER -