Estimating dominance norms of multiple data streams

Graham Cormode, Shanmugavelayutham Muthukrishnan

    Research output: Contribution to journalArticle

    Abstract

    There is much focus in the algorithms and database communities on designing tools to manage and mine data streams. Typically, data streams consist of multiple signals. Formally, a stream of multiple signals is (i, ai,j) where i's correspond to the domain, j's index the different signals and a i,j ≥ 0 give the value of the jth signal at point i. We study the problem of finding norms that are cumulative of the multiple signals in the data stream. For example, consider the max-dominance norm, defined as ∑i maxj{ai,j}. It may be thought as estimating the norm of the "upper envelope" of the multiple signals, or alternatively, as estimating the norm of the "marginal" distribution of tabular data streams. It is used in applications to estimate the "worst case influence" of multiple processes, for example in IP traffic analysis, electrical grid monitoring and financial domain. In addition, it is a natural measure, generalizing the union of data streams or counting distinct elements in data streams. We present the first known data stream algorithms for estimating max-dominance of multiple signals. In particular, we use workspace and time-per-item that are both sublinear (in fact, poly-logarithmic) in the input size. In contrast other notions of dominance on streams a, b - min-dominance (∑i minj{a i,j}), countdominance (|{i|ai > bi}|) or relative-dominance (∑i ai/ max{1, bi}) - are all impossible to estimate accurately with sublinear space.

    Original languageEnglish (US)
    Pages (from-to)148-160
    Number of pages13
    JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume2832
    StatePublished - Dec 1 2003

    Fingerprint

    Data Streams
    Norm
    Monitoring
    Traffic Analysis
    Workspace
    Marginal Distribution
    Estimate
    Envelope
    Counting
    Logarithmic
    Union
    Grid
    Distinct

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • Computer Science(all)

    Cite this

    Estimating dominance norms of multiple data streams. / Cormode, Graham; Muthukrishnan, Shanmugavelayutham.

    In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 2832, 01.12.2003, p. 148-160.

    Research output: Contribution to journalArticle

    @article{0fc665f7204f486dad40ab5c503c9985,
    title = "Estimating dominance norms of multiple data streams",
    abstract = "There is much focus in the algorithms and database communities on designing tools to manage and mine data streams. Typically, data streams consist of multiple signals. Formally, a stream of multiple signals is (i, ai,j) where i's correspond to the domain, j's index the different signals and a i,j ≥ 0 give the value of the jth signal at point i. We study the problem of finding norms that are cumulative of the multiple signals in the data stream. For example, consider the max-dominance norm, defined as ∑i maxj{ai,j}. It may be thought as estimating the norm of the {"}upper envelope{"} of the multiple signals, or alternatively, as estimating the norm of the {"}marginal{"} distribution of tabular data streams. It is used in applications to estimate the {"}worst case influence{"} of multiple processes, for example in IP traffic analysis, electrical grid monitoring and financial domain. In addition, it is a natural measure, generalizing the union of data streams or counting distinct elements in data streams. We present the first known data stream algorithms for estimating max-dominance of multiple signals. In particular, we use workspace and time-per-item that are both sublinear (in fact, poly-logarithmic) in the input size. In contrast other notions of dominance on streams a, b - min-dominance (∑i minj{a i,j}), countdominance (|{i|ai > bi}|) or relative-dominance (∑i ai/ max{1, bi}) - are all impossible to estimate accurately with sublinear space.",
    author = "Graham Cormode and Shanmugavelayutham Muthukrishnan",
    year = "2003",
    month = "12",
    day = "1",
    language = "English (US)",
    volume = "2832",
    pages = "148--160",
    journal = "Lecture Notes in Computer Science",
    issn = "0302-9743",
    publisher = "Springer Verlag",

    }

    TY - JOUR

    T1 - Estimating dominance norms of multiple data streams

    AU - Cormode, Graham

    AU - Muthukrishnan, Shanmugavelayutham

    PY - 2003/12/1

    Y1 - 2003/12/1

    N2 - There is much focus in the algorithms and database communities on designing tools to manage and mine data streams. Typically, data streams consist of multiple signals. Formally, a stream of multiple signals is (i, ai,j) where i's correspond to the domain, j's index the different signals and a i,j ≥ 0 give the value of the jth signal at point i. We study the problem of finding norms that are cumulative of the multiple signals in the data stream. For example, consider the max-dominance norm, defined as ∑i maxj{ai,j}. It may be thought as estimating the norm of the "upper envelope" of the multiple signals, or alternatively, as estimating the norm of the "marginal" distribution of tabular data streams. It is used in applications to estimate the "worst case influence" of multiple processes, for example in IP traffic analysis, electrical grid monitoring and financial domain. In addition, it is a natural measure, generalizing the union of data streams or counting distinct elements in data streams. We present the first known data stream algorithms for estimating max-dominance of multiple signals. In particular, we use workspace and time-per-item that are both sublinear (in fact, poly-logarithmic) in the input size. In contrast other notions of dominance on streams a, b - min-dominance (∑i minj{a i,j}), countdominance (|{i|ai > bi}|) or relative-dominance (∑i ai/ max{1, bi}) - are all impossible to estimate accurately with sublinear space.

    AB - There is much focus in the algorithms and database communities on designing tools to manage and mine data streams. Typically, data streams consist of multiple signals. Formally, a stream of multiple signals is (i, ai,j) where i's correspond to the domain, j's index the different signals and a i,j ≥ 0 give the value of the jth signal at point i. We study the problem of finding norms that are cumulative of the multiple signals in the data stream. For example, consider the max-dominance norm, defined as ∑i maxj{ai,j}. It may be thought as estimating the norm of the "upper envelope" of the multiple signals, or alternatively, as estimating the norm of the "marginal" distribution of tabular data streams. It is used in applications to estimate the "worst case influence" of multiple processes, for example in IP traffic analysis, electrical grid monitoring and financial domain. In addition, it is a natural measure, generalizing the union of data streams or counting distinct elements in data streams. We present the first known data stream algorithms for estimating max-dominance of multiple signals. In particular, we use workspace and time-per-item that are both sublinear (in fact, poly-logarithmic) in the input size. In contrast other notions of dominance on streams a, b - min-dominance (∑i minj{a i,j}), countdominance (|{i|ai > bi}|) or relative-dominance (∑i ai/ max{1, bi}) - are all impossible to estimate accurately with sublinear space.

    UR - http://www.scopus.com/inward/record.url?scp=0142245899&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=0142245899&partnerID=8YFLogxK

    M3 - Article

    AN - SCOPUS:0142245899

    VL - 2832

    SP - 148

    EP - 160

    JO - Lecture Notes in Computer Science

    JF - Lecture Notes in Computer Science

    SN - 0302-9743

    ER -