Sampling algorithms in a stream operator

Theodore Johnson, Shanmugavelayutham Muthukrishnan, Irina Rozenbaum

    Research output: Contribution to journalConference article

    Abstract

    Complex queries over high speed data streams often need to rely on approximations to keep up with their input. The research community has developed a rich literature on approximate streaming algorithms for this application. Many of these algorithms produce samples of the input stream, providing better properties than conventional random sampling. In this paper, we abstract the stream sampling process and design a new stream sample operator. We show how it can be used to implement a wide variety of algorithms that perform sampling and sampling-based aggregations. Also, we show how to implement the operator in Gigascope - a high speed stream database specialized for IP network monitoring applications. As an example study, we apply the operator within such an enhanced Gigascope to perform subset-sum sampling which is of great interest for IP network management. We evaluate this implemention on a live, high speed internet traffic data stream and find that (a) the operator is a flexible, versatile addition to Gigascope suitable for tuning and algorithm engineering, and (b) the operator imposes only a small evaluation overhead. This is the first operational implementation we know of, for a wide variety of stream sampling algorithms at line speed within a data stream management system.

    Original languageEnglish (US)
    Pages (from-to)1-12
    Number of pages12
    JournalProceedings of the ACM SIGMOD International Conference on Management of Data
    StatePublished - Dec 1 2005
    EventSIGMOD 2005: ACM SIGMOD International Conference on Management of Data - Baltimore, MD, United States
    Duration: Jun 14 2005Jun 16 2005

    Fingerprint

    Mathematical operators
    Sampling
    Network management
    Set theory
    Agglomeration
    Tuning
    Internet
    Monitoring

    ASJC Scopus subject areas

    • Software
    • Information Systems

    Cite this

    Sampling algorithms in a stream operator. / Johnson, Theodore; Muthukrishnan, Shanmugavelayutham; Rozenbaum, Irina.

    In: Proceedings of the ACM SIGMOD International Conference on Management of Data, 01.12.2005, p. 1-12.

    Research output: Contribution to journalConference article

    Johnson, Theodore ; Muthukrishnan, Shanmugavelayutham ; Rozenbaum, Irina. / Sampling algorithms in a stream operator. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2005 ; pp. 1-12.
    @article{86a5f568cf184c7da6e52ce203f8c740,
    title = "Sampling algorithms in a stream operator",
    abstract = "Complex queries over high speed data streams often need to rely on approximations to keep up with their input. The research community has developed a rich literature on approximate streaming algorithms for this application. Many of these algorithms produce samples of the input stream, providing better properties than conventional random sampling. In this paper, we abstract the stream sampling process and design a new stream sample operator. We show how it can be used to implement a wide variety of algorithms that perform sampling and sampling-based aggregations. Also, we show how to implement the operator in Gigascope - a high speed stream database specialized for IP network monitoring applications. As an example study, we apply the operator within such an enhanced Gigascope to perform subset-sum sampling which is of great interest for IP network management. We evaluate this implemention on a live, high speed internet traffic data stream and find that (a) the operator is a flexible, versatile addition to Gigascope suitable for tuning and algorithm engineering, and (b) the operator imposes only a small evaluation overhead. This is the first operational implementation we know of, for a wide variety of stream sampling algorithms at line speed within a data stream management system.",
    author = "Theodore Johnson and Shanmugavelayutham Muthukrishnan and Irina Rozenbaum",
    year = "2005",
    month = "12",
    day = "1",
    language = "English (US)",
    pages = "1--12",
    journal = "Proceedings of the ACM SIGMOD International Conference on Management of Data",
    issn = "0730-8078",
    publisher = "Association for Computing Machinery (ACM)",

    }

    TY - JOUR

    T1 - Sampling algorithms in a stream operator

    AU - Johnson, Theodore

    AU - Muthukrishnan, Shanmugavelayutham

    AU - Rozenbaum, Irina

    PY - 2005/12/1

    Y1 - 2005/12/1

    N2 - Complex queries over high speed data streams often need to rely on approximations to keep up with their input. The research community has developed a rich literature on approximate streaming algorithms for this application. Many of these algorithms produce samples of the input stream, providing better properties than conventional random sampling. In this paper, we abstract the stream sampling process and design a new stream sample operator. We show how it can be used to implement a wide variety of algorithms that perform sampling and sampling-based aggregations. Also, we show how to implement the operator in Gigascope - a high speed stream database specialized for IP network monitoring applications. As an example study, we apply the operator within such an enhanced Gigascope to perform subset-sum sampling which is of great interest for IP network management. We evaluate this implemention on a live, high speed internet traffic data stream and find that (a) the operator is a flexible, versatile addition to Gigascope suitable for tuning and algorithm engineering, and (b) the operator imposes only a small evaluation overhead. This is the first operational implementation we know of, for a wide variety of stream sampling algorithms at line speed within a data stream management system.

    AB - Complex queries over high speed data streams often need to rely on approximations to keep up with their input. The research community has developed a rich literature on approximate streaming algorithms for this application. Many of these algorithms produce samples of the input stream, providing better properties than conventional random sampling. In this paper, we abstract the stream sampling process and design a new stream sample operator. We show how it can be used to implement a wide variety of algorithms that perform sampling and sampling-based aggregations. Also, we show how to implement the operator in Gigascope - a high speed stream database specialized for IP network monitoring applications. As an example study, we apply the operator within such an enhanced Gigascope to perform subset-sum sampling which is of great interest for IP network management. We evaluate this implemention on a live, high speed internet traffic data stream and find that (a) the operator is a flexible, versatile addition to Gigascope suitable for tuning and algorithm engineering, and (b) the operator imposes only a small evaluation overhead. This is the first operational implementation we know of, for a wide variety of stream sampling algorithms at line speed within a data stream management system.

    UR - http://www.scopus.com/inward/record.url?scp=29844452412&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=29844452412&partnerID=8YFLogxK

    M3 - Conference article

    AN - SCOPUS:29844452412

    SP - 1

    EP - 12

    JO - Proceedings of the ACM SIGMOD International Conference on Management of Data

    JF - Proceedings of the ACM SIGMOD International Conference on Management of Data

    SN - 0730-8078

    ER -