Domain-driven data synopses for dynamic quantiles

Anna C. Gilbert, Yannis Kotidis, Shanmugavelayutham Muthukrishnan, Martin J. Strauss

    Research output: Contribution to journalArticle

    Abstract

    In this paper, we present new algorithms for dynamically computing quantiles of a relation subject to insert as well as delete operations. At the core of our algorithms lies a small-space multiresolution representation of the underlying data distribution based on random subset sums or RSSs. These RSSs are updated with every insert and delete operation. When quantiles are demanded, we use these RSSs to estimate quickly, without having to access the data, all the quantiles, each guaranteed to be accurate to within user-specified precision. While quantiles have found many uses in databases, in this paper, our focus is primarily on network management applications that monitor the distribution of active sessions in the network. Our examples are drawn both from the telephony and the IP network, where the goal is to monitor the distribution of the length of active calls and IP flows, respectively, over time. For such applications, we propose a new type of histogram that uses RSSs for summarizing the dynamic parts of the distributions while other parts with small volume of sessions are approximated using simple counters.

    Original languageEnglish (US)
    Pages (from-to)927-937
    Number of pages11
    JournalIEEE Transactions on Knowledge and Data Engineering
    Volume17
    Issue number7
    DOIs
    StatePublished - Jul 1 2005

    Fingerprint

    RSS
    Network management

    Keywords

    • Data streams
    • Database statistics
    • Quantiles

    ASJC Scopus subject areas

    • Information Systems
    • Computer Science Applications
    • Computational Theory and Mathematics

    Cite this

    Gilbert, A. C., Kotidis, Y., Muthukrishnan, S., & Strauss, M. J. (2005). Domain-driven data synopses for dynamic quantiles. IEEE Transactions on Knowledge and Data Engineering, 17(7), 927-937. https://doi.org/10.1109/TKDE.2005.108

    Domain-driven data synopses for dynamic quantiles. / Gilbert, Anna C.; Kotidis, Yannis; Muthukrishnan, Shanmugavelayutham; Strauss, Martin J.

    In: IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 7, 01.07.2005, p. 927-937.

    Research output: Contribution to journalArticle

    Gilbert, AC, Kotidis, Y, Muthukrishnan, S & Strauss, MJ 2005, 'Domain-driven data synopses for dynamic quantiles', IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 7, pp. 927-937. https://doi.org/10.1109/TKDE.2005.108
    Gilbert, Anna C. ; Kotidis, Yannis ; Muthukrishnan, Shanmugavelayutham ; Strauss, Martin J. / Domain-driven data synopses for dynamic quantiles. In: IEEE Transactions on Knowledge and Data Engineering. 2005 ; Vol. 17, No. 7. pp. 927-937.
    @article{b5b4df9ef83441f890b7f77230bd5044,
    title = "Domain-driven data synopses for dynamic quantiles",
    abstract = "In this paper, we present new algorithms for dynamically computing quantiles of a relation subject to insert as well as delete operations. At the core of our algorithms lies a small-space multiresolution representation of the underlying data distribution based on random subset sums or RSSs. These RSSs are updated with every insert and delete operation. When quantiles are demanded, we use these RSSs to estimate quickly, without having to access the data, all the quantiles, each guaranteed to be accurate to within user-specified precision. While quantiles have found many uses in databases, in this paper, our focus is primarily on network management applications that monitor the distribution of active sessions in the network. Our examples are drawn both from the telephony and the IP network, where the goal is to monitor the distribution of the length of active calls and IP flows, respectively, over time. For such applications, we propose a new type of histogram that uses RSSs for summarizing the dynamic parts of the distributions while other parts with small volume of sessions are approximated using simple counters.",
    keywords = "Data streams, Database statistics, Quantiles",
    author = "Gilbert, {Anna C.} and Yannis Kotidis and Shanmugavelayutham Muthukrishnan and Strauss, {Martin J.}",
    year = "2005",
    month = "7",
    day = "1",
    doi = "10.1109/TKDE.2005.108",
    language = "English (US)",
    volume = "17",
    pages = "927--937",
    journal = "IEEE Transactions on Knowledge and Data Engineering",
    issn = "1041-4347",
    publisher = "IEEE Computer Society",
    number = "7",

    }

    TY - JOUR

    T1 - Domain-driven data synopses for dynamic quantiles

    AU - Gilbert, Anna C.

    AU - Kotidis, Yannis

    AU - Muthukrishnan, Shanmugavelayutham

    AU - Strauss, Martin J.

    PY - 2005/7/1

    Y1 - 2005/7/1

    N2 - In this paper, we present new algorithms for dynamically computing quantiles of a relation subject to insert as well as delete operations. At the core of our algorithms lies a small-space multiresolution representation of the underlying data distribution based on random subset sums or RSSs. These RSSs are updated with every insert and delete operation. When quantiles are demanded, we use these RSSs to estimate quickly, without having to access the data, all the quantiles, each guaranteed to be accurate to within user-specified precision. While quantiles have found many uses in databases, in this paper, our focus is primarily on network management applications that monitor the distribution of active sessions in the network. Our examples are drawn both from the telephony and the IP network, where the goal is to monitor the distribution of the length of active calls and IP flows, respectively, over time. For such applications, we propose a new type of histogram that uses RSSs for summarizing the dynamic parts of the distributions while other parts with small volume of sessions are approximated using simple counters.

    AB - In this paper, we present new algorithms for dynamically computing quantiles of a relation subject to insert as well as delete operations. At the core of our algorithms lies a small-space multiresolution representation of the underlying data distribution based on random subset sums or RSSs. These RSSs are updated with every insert and delete operation. When quantiles are demanded, we use these RSSs to estimate quickly, without having to access the data, all the quantiles, each guaranteed to be accurate to within user-specified precision. While quantiles have found many uses in databases, in this paper, our focus is primarily on network management applications that monitor the distribution of active sessions in the network. Our examples are drawn both from the telephony and the IP network, where the goal is to monitor the distribution of the length of active calls and IP flows, respectively, over time. For such applications, we propose a new type of histogram that uses RSSs for summarizing the dynamic parts of the distributions while other parts with small volume of sessions are approximated using simple counters.

    KW - Data streams

    KW - Database statistics

    KW - Quantiles

    UR - http://www.scopus.com/inward/record.url?scp=22944450669&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=22944450669&partnerID=8YFLogxK

    U2 - 10.1109/TKDE.2005.108

    DO - 10.1109/TKDE.2005.108

    M3 - Article

    AN - SCOPUS:22944450669

    VL - 17

    SP - 927

    EP - 937

    JO - IEEE Transactions on Knowledge and Data Engineering

    JF - IEEE Transactions on Knowledge and Data Engineering

    SN - 1041-4347

    IS - 7

    ER -