Space efficient mining of multigraph streams

Graham Cormode, Shanmugavelayutham Muthukrishnan

    Research output: Contribution to conferencePaper

    Abstract

    The challenge of monitoring massive amounts of data generated by communication networks has led to the interest in data stream processing. We study streams of edges in massive communication multigraphs, defined by (source, destination) pairs. The goal is to compute properties of the underlying graph while using small space (much smaller than the number of communicants), and to avoid bias introduced because some edges may appear many times, while others are seen only once. We give results for three fundamental problems on multigraph degree sequences: estimating frequency moments of degrees, finding the heavy hitter degrees, and computing range sums of degree values. In all cases we are able to show space bounds for our summarizing algorithms that are significantly smaller than storing complete information. We use a variety of data stream methods: sketches, sampling, hashing and distinct counting, but a common feature is that we use cascaded summaries: nesting multiple estimation techniques within one another. In our experimental study, we see that such summaries are highly effective, enabling massive multigraph streams to be effectively summarized to answer queries of interest with high accuracy using only a small amount of space.

    Original languageEnglish (US)
    Pages271-282
    Number of pages12
    DOIs
    StatePublished - Dec 1 2005
    EventTwenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2005 - Baltimore, MD, United States
    Duration: Jun 13 2005Jun 15 2005

    Conference

    ConferenceTwenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2005
    CountryUnited States
    CityBaltimore, MD
    Period6/13/056/15/05

    Fingerprint

    Telecommunication networks
    Sampling
    Monitoring
    Communication
    Processing

    ASJC Scopus subject areas

    • Software
    • Information Systems
    • Hardware and Architecture

    Cite this

    Cormode, G., & Muthukrishnan, S. (2005). Space efficient mining of multigraph streams. 271-282. Paper presented at Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2005, Baltimore, MD, United States. https://doi.org/10.1145/1065167.1065201

    Space efficient mining of multigraph streams. / Cormode, Graham; Muthukrishnan, Shanmugavelayutham.

    2005. 271-282 Paper presented at Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2005, Baltimore, MD, United States.

    Research output: Contribution to conferencePaper

    Cormode, G & Muthukrishnan, S 2005, 'Space efficient mining of multigraph streams', Paper presented at Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2005, Baltimore, MD, United States, 6/13/05 - 6/15/05 pp. 271-282. https://doi.org/10.1145/1065167.1065201
    Cormode G, Muthukrishnan S. Space efficient mining of multigraph streams. 2005. Paper presented at Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2005, Baltimore, MD, United States. https://doi.org/10.1145/1065167.1065201
    Cormode, Graham ; Muthukrishnan, Shanmugavelayutham. / Space efficient mining of multigraph streams. Paper presented at Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2005, Baltimore, MD, United States.12 p.
    @conference{9ebddff60ad743838b57388fc77c4187,
    title = "Space efficient mining of multigraph streams",
    abstract = "The challenge of monitoring massive amounts of data generated by communication networks has led to the interest in data stream processing. We study streams of edges in massive communication multigraphs, defined by (source, destination) pairs. The goal is to compute properties of the underlying graph while using small space (much smaller than the number of communicants), and to avoid bias introduced because some edges may appear many times, while others are seen only once. We give results for three fundamental problems on multigraph degree sequences: estimating frequency moments of degrees, finding the heavy hitter degrees, and computing range sums of degree values. In all cases we are able to show space bounds for our summarizing algorithms that are significantly smaller than storing complete information. We use a variety of data stream methods: sketches, sampling, hashing and distinct counting, but a common feature is that we use cascaded summaries: nesting multiple estimation techniques within one another. In our experimental study, we see that such summaries are highly effective, enabling massive multigraph streams to be effectively summarized to answer queries of interest with high accuracy using only a small amount of space.",
    author = "Graham Cormode and Shanmugavelayutham Muthukrishnan",
    year = "2005",
    month = "12",
    day = "1",
    doi = "10.1145/1065167.1065201",
    language = "English (US)",
    pages = "271--282",
    note = "Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2005 ; Conference date: 13-06-2005 Through 15-06-2005",

    }

    TY - CONF

    T1 - Space efficient mining of multigraph streams

    AU - Cormode, Graham

    AU - Muthukrishnan, Shanmugavelayutham

    PY - 2005/12/1

    Y1 - 2005/12/1

    N2 - The challenge of monitoring massive amounts of data generated by communication networks has led to the interest in data stream processing. We study streams of edges in massive communication multigraphs, defined by (source, destination) pairs. The goal is to compute properties of the underlying graph while using small space (much smaller than the number of communicants), and to avoid bias introduced because some edges may appear many times, while others are seen only once. We give results for three fundamental problems on multigraph degree sequences: estimating frequency moments of degrees, finding the heavy hitter degrees, and computing range sums of degree values. In all cases we are able to show space bounds for our summarizing algorithms that are significantly smaller than storing complete information. We use a variety of data stream methods: sketches, sampling, hashing and distinct counting, but a common feature is that we use cascaded summaries: nesting multiple estimation techniques within one another. In our experimental study, we see that such summaries are highly effective, enabling massive multigraph streams to be effectively summarized to answer queries of interest with high accuracy using only a small amount of space.

    AB - The challenge of monitoring massive amounts of data generated by communication networks has led to the interest in data stream processing. We study streams of edges in massive communication multigraphs, defined by (source, destination) pairs. The goal is to compute properties of the underlying graph while using small space (much smaller than the number of communicants), and to avoid bias introduced because some edges may appear many times, while others are seen only once. We give results for three fundamental problems on multigraph degree sequences: estimating frequency moments of degrees, finding the heavy hitter degrees, and computing range sums of degree values. In all cases we are able to show space bounds for our summarizing algorithms that are significantly smaller than storing complete information. We use a variety of data stream methods: sketches, sampling, hashing and distinct counting, but a common feature is that we use cascaded summaries: nesting multiple estimation techniques within one another. In our experimental study, we see that such summaries are highly effective, enabling massive multigraph streams to be effectively summarized to answer queries of interest with high accuracy using only a small amount of space.

    UR - http://www.scopus.com/inward/record.url?scp=33244491040&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=33244491040&partnerID=8YFLogxK

    U2 - 10.1145/1065167.1065201

    DO - 10.1145/1065167.1065201

    M3 - Paper

    AN - SCOPUS:33244491040

    SP - 271

    EP - 282

    ER -