Summarizing two-dimensional data with skyline-based statistical descriptors

Graham Cormode, Flip Korn, Shanmugavelayutham Muthukrishnan, Divesh Srivastava

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Much real data consists of more than one dimension, such as financial transactions (eg, price × volume) and IP network flows (eg, duration × numBytes), and capture relationships between the variables. For a single dimension, quantiles are intuitive and robust descriptors. Processing and analyzing such data, particularly in data warehouse or data streaming settings, requires similarly robust and informative statistical descriptors that go beyond one-dimension. Applying quantile methods to summarize a multidimensional distribution along only singleton attributes ignores the rich dependence amongst the variables. In this paper, we present new skyline-based statistical descriptors for capturing the distributions over pairs of dimensions. They generalize the notion of quantiles in the individual dimensions, and also incorporate properties of the joint distribution. We introduce φ-quantours and α-radials, which are skyline points over subsets of the data, and propose (φ, α)-quantiles, found from the union of these skylines, as statistical descriptors of two-dimensional distributions. We present efficient online algorithms for tracking (φ,α)-quantiles on two-dimensional streams using guaranteed small space. We identify the principal properties of the proposed descriptors and perform extensive experiments with synthetic and real IP traffic data to study the efficiency of our proposed algorithms.

    Original languageEnglish (US)
    Title of host publicationScientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings
    Pages42-60
    Number of pages19
    DOIs
    StatePublished - Aug 14 2008
    Event20th International Conference on Scientific and Statistical Database Management, SSDBM 2008 - Hong Kong, China
    Duration: Jul 9 2008Jul 11 2008

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume5069 LNCS
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Other

    Other20th International Conference on Scientific and Statistical Database Management, SSDBM 2008
    CountryChina
    CityHong Kong
    Period7/9/087/11/08

    Fingerprint

    Skyline
    Quantile
    Descriptors
    Data warehouses
    One Dimension
    Processing
    Streaming Data
    IP Networks
    Network Flow
    Data Warehouse
    Online Algorithms
    Joint Distribution
    Experiments
    Transactions
    Intuitive
    Union
    Efficient Algorithms
    Attribute
    Traffic
    Generalise

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • Computer Science(all)

    Cite this

    Cormode, G., Korn, F., Muthukrishnan, S., & Srivastava, D. (2008). Summarizing two-dimensional data with skyline-based statistical descriptors. In Scientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings (pp. 42-60). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5069 LNCS). https://doi.org/10.1007/978-3-540-69497-7_6

    Summarizing two-dimensional data with skyline-based statistical descriptors. / Cormode, Graham; Korn, Flip; Muthukrishnan, Shanmugavelayutham; Srivastava, Divesh.

    Scientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings. 2008. p. 42-60 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5069 LNCS).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Cormode, G, Korn, F, Muthukrishnan, S & Srivastava, D 2008, Summarizing two-dimensional data with skyline-based statistical descriptors. in Scientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5069 LNCS, pp. 42-60, 20th International Conference on Scientific and Statistical Database Management, SSDBM 2008, Hong Kong, China, 7/9/08. https://doi.org/10.1007/978-3-540-69497-7_6
    Cormode G, Korn F, Muthukrishnan S, Srivastava D. Summarizing two-dimensional data with skyline-based statistical descriptors. In Scientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings. 2008. p. 42-60. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-540-69497-7_6
    Cormode, Graham ; Korn, Flip ; Muthukrishnan, Shanmugavelayutham ; Srivastava, Divesh. / Summarizing two-dimensional data with skyline-based statistical descriptors. Scientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings. 2008. pp. 42-60 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
    @inproceedings{eb9af207bd8c49fbb4ec2a7e5736a339,
    title = "Summarizing two-dimensional data with skyline-based statistical descriptors",
    abstract = "Much real data consists of more than one dimension, such as financial transactions (eg, price × volume) and IP network flows (eg, duration × numBytes), and capture relationships between the variables. For a single dimension, quantiles are intuitive and robust descriptors. Processing and analyzing such data, particularly in data warehouse or data streaming settings, requires similarly robust and informative statistical descriptors that go beyond one-dimension. Applying quantile methods to summarize a multidimensional distribution along only singleton attributes ignores the rich dependence amongst the variables. In this paper, we present new skyline-based statistical descriptors for capturing the distributions over pairs of dimensions. They generalize the notion of quantiles in the individual dimensions, and also incorporate properties of the joint distribution. We introduce φ-quantours and α-radials, which are skyline points over subsets of the data, and propose (φ, α)-quantiles, found from the union of these skylines, as statistical descriptors of two-dimensional distributions. We present efficient online algorithms for tracking (φ,α)-quantiles on two-dimensional streams using guaranteed small space. We identify the principal properties of the proposed descriptors and perform extensive experiments with synthetic and real IP traffic data to study the efficiency of our proposed algorithms.",
    author = "Graham Cormode and Flip Korn and Shanmugavelayutham Muthukrishnan and Divesh Srivastava",
    year = "2008",
    month = "8",
    day = "14",
    doi = "10.1007/978-3-540-69497-7_6",
    language = "English (US)",
    isbn = "3540694765",
    series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
    pages = "42--60",
    booktitle = "Scientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings",

    }

    TY - GEN

    T1 - Summarizing two-dimensional data with skyline-based statistical descriptors

    AU - Cormode, Graham

    AU - Korn, Flip

    AU - Muthukrishnan, Shanmugavelayutham

    AU - Srivastava, Divesh

    PY - 2008/8/14

    Y1 - 2008/8/14

    N2 - Much real data consists of more than one dimension, such as financial transactions (eg, price × volume) and IP network flows (eg, duration × numBytes), and capture relationships between the variables. For a single dimension, quantiles are intuitive and robust descriptors. Processing and analyzing such data, particularly in data warehouse or data streaming settings, requires similarly robust and informative statistical descriptors that go beyond one-dimension. Applying quantile methods to summarize a multidimensional distribution along only singleton attributes ignores the rich dependence amongst the variables. In this paper, we present new skyline-based statistical descriptors for capturing the distributions over pairs of dimensions. They generalize the notion of quantiles in the individual dimensions, and also incorporate properties of the joint distribution. We introduce φ-quantours and α-radials, which are skyline points over subsets of the data, and propose (φ, α)-quantiles, found from the union of these skylines, as statistical descriptors of two-dimensional distributions. We present efficient online algorithms for tracking (φ,α)-quantiles on two-dimensional streams using guaranteed small space. We identify the principal properties of the proposed descriptors and perform extensive experiments with synthetic and real IP traffic data to study the efficiency of our proposed algorithms.

    AB - Much real data consists of more than one dimension, such as financial transactions (eg, price × volume) and IP network flows (eg, duration × numBytes), and capture relationships between the variables. For a single dimension, quantiles are intuitive and robust descriptors. Processing and analyzing such data, particularly in data warehouse or data streaming settings, requires similarly robust and informative statistical descriptors that go beyond one-dimension. Applying quantile methods to summarize a multidimensional distribution along only singleton attributes ignores the rich dependence amongst the variables. In this paper, we present new skyline-based statistical descriptors for capturing the distributions over pairs of dimensions. They generalize the notion of quantiles in the individual dimensions, and also incorporate properties of the joint distribution. We introduce φ-quantours and α-radials, which are skyline points over subsets of the data, and propose (φ, α)-quantiles, found from the union of these skylines, as statistical descriptors of two-dimensional distributions. We present efficient online algorithms for tracking (φ,α)-quantiles on two-dimensional streams using guaranteed small space. We identify the principal properties of the proposed descriptors and perform extensive experiments with synthetic and real IP traffic data to study the efficiency of our proposed algorithms.

    UR - http://www.scopus.com/inward/record.url?scp=49049093043&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=49049093043&partnerID=8YFLogxK

    U2 - 10.1007/978-3-540-69497-7_6

    DO - 10.1007/978-3-540-69497-7_6

    M3 - Conference contribution

    AN - SCOPUS:49049093043

    SN - 3540694765

    SN - 9783540694762

    T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    SP - 42

    EP - 60

    BT - Scientific and Statistical Database Management - 20th International Conference, SSDBM 2008, Proceedings

    ER -