What's different: Distributed, continuous monitoring of duplicate-resilient aggregates on data streams

Graham Cormode, Shanmugavelayutham Muthukrishnan, Wei Zhuang

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Emerging applications in sensor systems and network-wide IP traffic analysis present many technical challenges. They need distributed monitoring and continuous tracking of events. They have severe resource constraints not only at each site in terms of per-update processing time and archival space for high-speed streams of observations, but also crucially, communication constraints for collaborating on the monitoring task. These elements have been addressed in a series of recent works. A fundamental issue that arises is that one cannot make the "uniqueness" assumption on observed events which is present in previous works, since widescale monitoring invariably encounters the same events at different points. For example, within the network of an Internet Service Provider packets of the same flow will be observed in different routers; similarly, the same individual will be observed by multiple mobile sensors in monitoring wild animals. Aggregates of interest on such distributed environments must be resilient to duplicate observations. We study such duplicate-resilient aggregates that measure the extent of the duplication - how many unique observations are there, how many observations are unique - as well as standard holistic aggregates such as quantiles and heavy hitters over the unique items. We present accuracy guaranteed, highly communication-efficient algorithms for these aggregates that work within the time and space constraints of high speed streams. We also present results of a detailed experimental study on both real-life and synthetic data.

    Original languageEnglish (US)
    Title of host publicationProceedings of the 22nd International Conference on Data Engineering, ICDE '06
    Number of pages1
    DOIs
    StatePublished - Oct 17 2006
    Event22nd International Conference on Data Engineering, ICDE '06 - Atlanta, GA, United States
    Duration: Apr 3 2006Apr 7 2006

    Publication series

    NameProceedings - International Conference on Data Engineering
    Volume2006
    ISSN (Print)1084-4627

    Other

    Other22nd International Conference on Data Engineering, ICDE '06
    CountryUnited States
    CityAtlanta, GA
    Period4/3/064/7/06

    Fingerprint

    Monitoring
    Internet service providers
    Communication
    Sensors
    Routers
    Animals
    Processing

    ASJC Scopus subject areas

    • Software
    • Signal Processing
    • Information Systems

    Cite this

    Cormode, G., Muthukrishnan, S., & Zhuang, W. (2006). What's different: Distributed, continuous monitoring of duplicate-resilient aggregates on data streams. In Proceedings of the 22nd International Conference on Data Engineering, ICDE '06 [1617425] (Proceedings - International Conference on Data Engineering; Vol. 2006). https://doi.org/10.1109/ICDE.2006.173

    What's different : Distributed, continuous monitoring of duplicate-resilient aggregates on data streams. / Cormode, Graham; Muthukrishnan, Shanmugavelayutham; Zhuang, Wei.

    Proceedings of the 22nd International Conference on Data Engineering, ICDE '06. 2006. 1617425 (Proceedings - International Conference on Data Engineering; Vol. 2006).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Cormode, G, Muthukrishnan, S & Zhuang, W 2006, What's different: Distributed, continuous monitoring of duplicate-resilient aggregates on data streams. in Proceedings of the 22nd International Conference on Data Engineering, ICDE '06., 1617425, Proceedings - International Conference on Data Engineering, vol. 2006, 22nd International Conference on Data Engineering, ICDE '06, Atlanta, GA, United States, 4/3/06. https://doi.org/10.1109/ICDE.2006.173
    Cormode G, Muthukrishnan S, Zhuang W. What's different: Distributed, continuous monitoring of duplicate-resilient aggregates on data streams. In Proceedings of the 22nd International Conference on Data Engineering, ICDE '06. 2006. 1617425. (Proceedings - International Conference on Data Engineering). https://doi.org/10.1109/ICDE.2006.173
    Cormode, Graham ; Muthukrishnan, Shanmugavelayutham ; Zhuang, Wei. / What's different : Distributed, continuous monitoring of duplicate-resilient aggregates on data streams. Proceedings of the 22nd International Conference on Data Engineering, ICDE '06. 2006. (Proceedings - International Conference on Data Engineering).
    @inproceedings{3f6c743607f549ccb101a80d22f5043a,
    title = "What's different: Distributed, continuous monitoring of duplicate-resilient aggregates on data streams",
    abstract = "Emerging applications in sensor systems and network-wide IP traffic analysis present many technical challenges. They need distributed monitoring and continuous tracking of events. They have severe resource constraints not only at each site in terms of per-update processing time and archival space for high-speed streams of observations, but also crucially, communication constraints for collaborating on the monitoring task. These elements have been addressed in a series of recent works. A fundamental issue that arises is that one cannot make the {"}uniqueness{"} assumption on observed events which is present in previous works, since widescale monitoring invariably encounters the same events at different points. For example, within the network of an Internet Service Provider packets of the same flow will be observed in different routers; similarly, the same individual will be observed by multiple mobile sensors in monitoring wild animals. Aggregates of interest on such distributed environments must be resilient to duplicate observations. We study such duplicate-resilient aggregates that measure the extent of the duplication - how many unique observations are there, how many observations are unique - as well as standard holistic aggregates such as quantiles and heavy hitters over the unique items. We present accuracy guaranteed, highly communication-efficient algorithms for these aggregates that work within the time and space constraints of high speed streams. We also present results of a detailed experimental study on both real-life and synthetic data.",
    author = "Graham Cormode and Shanmugavelayutham Muthukrishnan and Wei Zhuang",
    year = "2006",
    month = "10",
    day = "17",
    doi = "10.1109/ICDE.2006.173",
    language = "English (US)",
    isbn = "0769525709",
    series = "Proceedings - International Conference on Data Engineering",
    booktitle = "Proceedings of the 22nd International Conference on Data Engineering, ICDE '06",

    }

    TY - GEN

    T1 - What's different

    T2 - Distributed, continuous monitoring of duplicate-resilient aggregates on data streams

    AU - Cormode, Graham

    AU - Muthukrishnan, Shanmugavelayutham

    AU - Zhuang, Wei

    PY - 2006/10/17

    Y1 - 2006/10/17

    N2 - Emerging applications in sensor systems and network-wide IP traffic analysis present many technical challenges. They need distributed monitoring and continuous tracking of events. They have severe resource constraints not only at each site in terms of per-update processing time and archival space for high-speed streams of observations, but also crucially, communication constraints for collaborating on the monitoring task. These elements have been addressed in a series of recent works. A fundamental issue that arises is that one cannot make the "uniqueness" assumption on observed events which is present in previous works, since widescale monitoring invariably encounters the same events at different points. For example, within the network of an Internet Service Provider packets of the same flow will be observed in different routers; similarly, the same individual will be observed by multiple mobile sensors in monitoring wild animals. Aggregates of interest on such distributed environments must be resilient to duplicate observations. We study such duplicate-resilient aggregates that measure the extent of the duplication - how many unique observations are there, how many observations are unique - as well as standard holistic aggregates such as quantiles and heavy hitters over the unique items. We present accuracy guaranteed, highly communication-efficient algorithms for these aggregates that work within the time and space constraints of high speed streams. We also present results of a detailed experimental study on both real-life and synthetic data.

    AB - Emerging applications in sensor systems and network-wide IP traffic analysis present many technical challenges. They need distributed monitoring and continuous tracking of events. They have severe resource constraints not only at each site in terms of per-update processing time and archival space for high-speed streams of observations, but also crucially, communication constraints for collaborating on the monitoring task. These elements have been addressed in a series of recent works. A fundamental issue that arises is that one cannot make the "uniqueness" assumption on observed events which is present in previous works, since widescale monitoring invariably encounters the same events at different points. For example, within the network of an Internet Service Provider packets of the same flow will be observed in different routers; similarly, the same individual will be observed by multiple mobile sensors in monitoring wild animals. Aggregates of interest on such distributed environments must be resilient to duplicate observations. We study such duplicate-resilient aggregates that measure the extent of the duplication - how many unique observations are there, how many observations are unique - as well as standard holistic aggregates such as quantiles and heavy hitters over the unique items. We present accuracy guaranteed, highly communication-efficient algorithms for these aggregates that work within the time and space constraints of high speed streams. We also present results of a detailed experimental study on both real-life and synthetic data.

    UR - http://www.scopus.com/inward/record.url?scp=33749591511&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=33749591511&partnerID=8YFLogxK

    U2 - 10.1109/ICDE.2006.173

    DO - 10.1109/ICDE.2006.173

    M3 - Conference contribution

    AN - SCOPUS:33749591511

    SN - 0769525709

    SN - 9780769525709

    T3 - Proceedings - International Conference on Data Engineering

    BT - Proceedings of the 22nd International Conference on Data Engineering, ICDE '06

    ER -