Query-aware partitioning for monitoring massive network data streams

Theodore Johnson, S. Muthukrishnan, Vladislav Shkapenyuk, Oliver Spatscheck

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Data Stream Management Systems (DSMS) are gaining acceptance for applications that need to process very large volumes of data in real time. The load generated by such applications frequently exceeds by far the computation capabilities of a single centralized server. In particular, a single-server instance of our DSMS, Gigascope, cannot keep up with the processing demands of the new OC-786 networks, which can generate more than 100 million packets per second. In this paper, we explore a mechanism for the distributed processing of very high speed data streams. Existing distributed DSMSs employ two mechanisms for distributing the load across the participating machines: partitioning of the query execution plans and partitioning of the input data stream in a query-independent fashion. However, for a large class of queries, both approaches fail to reduce the load as compared to centralized system, and can even lead to an increase in the load. In this paper we present an alternative approach - query-aware data stream partitioning that allows for more efficient scaling. We have developed methods for analyzing any given query node to determine a partition strategy, reconcile potentially conflicting requirements that different queries in a query set place on partitioning, and to choose an optimal partitioning which minimizes overall communication costs..

    Original languageEnglish (US)
    Title of host publicationProceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08
    Pages1528-1530
    Number of pages3
    DOIs
    StatePublished - Oct 1 2008
    Event2008 IEEE 24th International Conference on Data Engineering, ICDE'08 - Cancun, Mexico
    Duration: Apr 7 2008Apr 12 2008

    Publication series

    NameProceedings - International Conference on Data Engineering
    ISSN (Print)1084-4627

    Other

    Other2008 IEEE 24th International Conference on Data Engineering, ICDE'08
    CountryMexico
    CityCancun
    Period4/7/084/12/08

      Fingerprint

    ASJC Scopus subject areas

    • Software
    • Signal Processing
    • Information Systems

    Cite this

    Johnson, T., Muthukrishnan, S., Shkapenyuk, V., & Spatscheck, O. (2008). Query-aware partitioning for monitoring massive network data streams. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08 (pp. 1528-1530). [4497612] (Proceedings - International Conference on Data Engineering). https://doi.org/10.1109/ICDE.2008.4497612