Sequential change detection on data streams

Shanmugavelayutham Muthukrishnan, Eric Van Den Berg, Yihua Wu

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Model-based declarative queries are becoming an attractive paradigm for interacting with many data stream applications. This has led to the development of techniques to accurately answer the queries using distributional models rather than raw values. The quintessential problem with this is that of detecting when there is a change in the input stream, which makes models stale and inaccurate. We adopt the sound statistical method of sequential hypothesis testing to study this problem on streams, without independence assumption. It yields algorithms that are fast, space-efficient, and oblivious to data's underlying distributions. Our experiments demonstrate the effectiveness of our methods to not only determine the existence of a change, but also the point where the change is initiated, relative to the ground truth we obtain. Our methods work seamlessly without window limitations inherent in prior work, thus have clearly shorter delays compared to alternative window-based solutions.

    Original languageEnglish (US)
    Title of host publicationICDM Workshops 2007 - Proceedings of the 17th IEEE International Conference on Data Mining Workshops
    Pages551-556
    Number of pages6
    DOIs
    StatePublished - Dec 1 2007
    Event17th IEEE International Conference on Data Mining Workshops, ICDM Workshops 2007 - Omaha, NE, United States
    Duration: Oct 28 2007Oct 31 2007

    Publication series

    NameProceedings - IEEE International Conference on Data Mining, ICDM
    ISSN (Print)1550-4786

    Conference

    Conference17th IEEE International Conference on Data Mining Workshops, ICDM Workshops 2007
    CountryUnited States
    CityOmaha, NE
    Period10/28/0710/31/07

    Fingerprint

    Statistical methods
    Acoustic waves
    Testing
    Experiments

    ASJC Scopus subject areas

    • Engineering(all)

    Cite this

    Muthukrishnan, S., Van Den Berg, E., & Wu, Y. (2007). Sequential change detection on data streams. In ICDM Workshops 2007 - Proceedings of the 17th IEEE International Conference on Data Mining Workshops (pp. 551-556). [4476721] (Proceedings - IEEE International Conference on Data Mining, ICDM). https://doi.org/10.1109/ICDMW.2007.89

    Sequential change detection on data streams. / Muthukrishnan, Shanmugavelayutham; Van Den Berg, Eric; Wu, Yihua.

    ICDM Workshops 2007 - Proceedings of the 17th IEEE International Conference on Data Mining Workshops. 2007. p. 551-556 4476721 (Proceedings - IEEE International Conference on Data Mining, ICDM).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Muthukrishnan, S, Van Den Berg, E & Wu, Y 2007, Sequential change detection on data streams. in ICDM Workshops 2007 - Proceedings of the 17th IEEE International Conference on Data Mining Workshops., 4476721, Proceedings - IEEE International Conference on Data Mining, ICDM, pp. 551-556, 17th IEEE International Conference on Data Mining Workshops, ICDM Workshops 2007, Omaha, NE, United States, 10/28/07. https://doi.org/10.1109/ICDMW.2007.89
    Muthukrishnan S, Van Den Berg E, Wu Y. Sequential change detection on data streams. In ICDM Workshops 2007 - Proceedings of the 17th IEEE International Conference on Data Mining Workshops. 2007. p. 551-556. 4476721. (Proceedings - IEEE International Conference on Data Mining, ICDM). https://doi.org/10.1109/ICDMW.2007.89
    Muthukrishnan, Shanmugavelayutham ; Van Den Berg, Eric ; Wu, Yihua. / Sequential change detection on data streams. ICDM Workshops 2007 - Proceedings of the 17th IEEE International Conference on Data Mining Workshops. 2007. pp. 551-556 (Proceedings - IEEE International Conference on Data Mining, ICDM).
    @inproceedings{46fb34fffa5f41e69b40331e29d70ae7,
    title = "Sequential change detection on data streams",
    abstract = "Model-based declarative queries are becoming an attractive paradigm for interacting with many data stream applications. This has led to the development of techniques to accurately answer the queries using distributional models rather than raw values. The quintessential problem with this is that of detecting when there is a change in the input stream, which makes models stale and inaccurate. We adopt the sound statistical method of sequential hypothesis testing to study this problem on streams, without independence assumption. It yields algorithms that are fast, space-efficient, and oblivious to data's underlying distributions. Our experiments demonstrate the effectiveness of our methods to not only determine the existence of a change, but also the point where the change is initiated, relative to the ground truth we obtain. Our methods work seamlessly without window limitations inherent in prior work, thus have clearly shorter delays compared to alternative window-based solutions.",
    author = "Shanmugavelayutham Muthukrishnan and {Van Den Berg}, Eric and Yihua Wu",
    year = "2007",
    month = "12",
    day = "1",
    doi = "10.1109/ICDMW.2007.89",
    language = "English (US)",
    isbn = "0769530192",
    series = "Proceedings - IEEE International Conference on Data Mining, ICDM",
    pages = "551--556",
    booktitle = "ICDM Workshops 2007 - Proceedings of the 17th IEEE International Conference on Data Mining Workshops",

    }

    TY - GEN

    T1 - Sequential change detection on data streams

    AU - Muthukrishnan, Shanmugavelayutham

    AU - Van Den Berg, Eric

    AU - Wu, Yihua

    PY - 2007/12/1

    Y1 - 2007/12/1

    N2 - Model-based declarative queries are becoming an attractive paradigm for interacting with many data stream applications. This has led to the development of techniques to accurately answer the queries using distributional models rather than raw values. The quintessential problem with this is that of detecting when there is a change in the input stream, which makes models stale and inaccurate. We adopt the sound statistical method of sequential hypothesis testing to study this problem on streams, without independence assumption. It yields algorithms that are fast, space-efficient, and oblivious to data's underlying distributions. Our experiments demonstrate the effectiveness of our methods to not only determine the existence of a change, but also the point where the change is initiated, relative to the ground truth we obtain. Our methods work seamlessly without window limitations inherent in prior work, thus have clearly shorter delays compared to alternative window-based solutions.

    AB - Model-based declarative queries are becoming an attractive paradigm for interacting with many data stream applications. This has led to the development of techniques to accurately answer the queries using distributional models rather than raw values. The quintessential problem with this is that of detecting when there is a change in the input stream, which makes models stale and inaccurate. We adopt the sound statistical method of sequential hypothesis testing to study this problem on streams, without independence assumption. It yields algorithms that are fast, space-efficient, and oblivious to data's underlying distributions. Our experiments demonstrate the effectiveness of our methods to not only determine the existence of a change, but also the point where the change is initiated, relative to the ground truth we obtain. Our methods work seamlessly without window limitations inherent in prior work, thus have clearly shorter delays compared to alternative window-based solutions.

    UR - http://www.scopus.com/inward/record.url?scp=49549123783&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=49549123783&partnerID=8YFLogxK

    U2 - 10.1109/ICDMW.2007.89

    DO - 10.1109/ICDMW.2007.89

    M3 - Conference contribution

    SN - 0769530192

    SN - 9780769530192

    T3 - Proceedings - IEEE International Conference on Data Mining, ICDM

    SP - 551

    EP - 556

    BT - ICDM Workshops 2007 - Proceedings of the 17th IEEE International Conference on Data Mining Workshops

    ER -