Identifying representative trends in massive time series data sets using sketches

Piotr Indyk, Nick Koudas, Shanmugavelayutham Muthukrishnan

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Many data stores, including scientific and financial databases, business warehouses and network repositories, contain time series data. Time series data depict trends for an observed value e.g., value of a stock, number of bytes sent on a router interface, etc., as a function of time. Analysis of the trends over different time windows is of great interest. In this paper, we formalize problems of identifying various 'representative' trends in time series data. Informally, an interval of observations in a time series is defined to be a representative trend if its distance from other intervals satisfy certain properties, for suitably defined distance functions between time series intervals. Natural trends of interest such as periodic or average trends are examples of representative trends. We present efficient algorithms for analyzing massive time series data sets for representative trends over arbitrary windows of interest. Our algorithms are highly processor and 10 efficient; they are approximate but provide probabilistic guarantees for the approximations achieved. Our approach for identifying representative trends relies on a dimensionality reduction technique that replaces each interval by a 'sketch' which is a low dimensional vector. We present efficient algorithms to construct such sketches using a pool of select sketches that we precompute using polynomial convolutions. Using such sketches, we can compute representative trends accurately. Finally, we present results of a detailed experimental study of our technique on very large real data sets. Our results show that, compared to approaches that determine representative trends exactly, our approach shows significant performance gains with only a small loss in accuracy.

    Original languageEnglish (US)
    Title of host publicationProceedings of the 26th International Conference on Very Large Data Bases, VLDB'00
    Pages363-372
    Number of pages10
    StatePublished - Dec 1 2000
    Event26th International Conference on Very Large Data Bases, VLDB 2000 - Cairo, Egypt
    Duration: Sep 10 2000Sep 14 2000

    Publication series

    NameProceedings of the 26th International Conference on Very Large Data Bases, VLDB'00

    Other

    Other26th International Conference on Very Large Data Bases, VLDB 2000
    CountryEgypt
    CityCairo
    Period9/10/009/14/00

    Fingerprint

    Time series
    Warehouses
    Convolution
    Routers
    Time series data
    Polynomials
    Industry

    ASJC Scopus subject areas

    • Hardware and Architecture
    • Information Systems
    • Software
    • Information Systems and Management

    Cite this

    Indyk, P., Koudas, N., & Muthukrishnan, S. (2000). Identifying representative trends in massive time series data sets using sketches. In Proceedings of the 26th International Conference on Very Large Data Bases, VLDB'00 (pp. 363-372). (Proceedings of the 26th International Conference on Very Large Data Bases, VLDB'00).

    Identifying representative trends in massive time series data sets using sketches. / Indyk, Piotr; Koudas, Nick; Muthukrishnan, Shanmugavelayutham.

    Proceedings of the 26th International Conference on Very Large Data Bases, VLDB'00. 2000. p. 363-372 (Proceedings of the 26th International Conference on Very Large Data Bases, VLDB'00).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Indyk, P, Koudas, N & Muthukrishnan, S 2000, Identifying representative trends in massive time series data sets using sketches. in Proceedings of the 26th International Conference on Very Large Data Bases, VLDB'00. Proceedings of the 26th International Conference on Very Large Data Bases, VLDB'00, pp. 363-372, 26th International Conference on Very Large Data Bases, VLDB 2000, Cairo, Egypt, 9/10/00.
    Indyk P, Koudas N, Muthukrishnan S. Identifying representative trends in massive time series data sets using sketches. In Proceedings of the 26th International Conference on Very Large Data Bases, VLDB'00. 2000. p. 363-372. (Proceedings of the 26th International Conference on Very Large Data Bases, VLDB'00).
    Indyk, Piotr ; Koudas, Nick ; Muthukrishnan, Shanmugavelayutham. / Identifying representative trends in massive time series data sets using sketches. Proceedings of the 26th International Conference on Very Large Data Bases, VLDB'00. 2000. pp. 363-372 (Proceedings of the 26th International Conference on Very Large Data Bases, VLDB'00).
    @inproceedings{90686fae8857417d9c124cc55d76002d,
    title = "Identifying representative trends in massive time series data sets using sketches",
    abstract = "Many data stores, including scientific and financial databases, business warehouses and network repositories, contain time series data. Time series data depict trends for an observed value e.g., value of a stock, number of bytes sent on a router interface, etc., as a function of time. Analysis of the trends over different time windows is of great interest. In this paper, we formalize problems of identifying various 'representative' trends in time series data. Informally, an interval of observations in a time series is defined to be a representative trend if its distance from other intervals satisfy certain properties, for suitably defined distance functions between time series intervals. Natural trends of interest such as periodic or average trends are examples of representative trends. We present efficient algorithms for analyzing massive time series data sets for representative trends over arbitrary windows of interest. Our algorithms are highly processor and 10 efficient; they are approximate but provide probabilistic guarantees for the approximations achieved. Our approach for identifying representative trends relies on a dimensionality reduction technique that replaces each interval by a 'sketch' which is a low dimensional vector. We present efficient algorithms to construct such sketches using a pool of select sketches that we precompute using polynomial convolutions. Using such sketches, we can compute representative trends accurately. Finally, we present results of a detailed experimental study of our technique on very large real data sets. Our results show that, compared to approaches that determine representative trends exactly, our approach shows significant performance gains with only a small loss in accuracy.",
    author = "Piotr Indyk and Nick Koudas and Shanmugavelayutham Muthukrishnan",
    year = "2000",
    month = "12",
    day = "1",
    language = "English (US)",
    isbn = "1558607153",
    series = "Proceedings of the 26th International Conference on Very Large Data Bases, VLDB'00",
    pages = "363--372",
    booktitle = "Proceedings of the 26th International Conference on Very Large Data Bases, VLDB'00",

    }

    TY - GEN

    T1 - Identifying representative trends in massive time series data sets using sketches

    AU - Indyk, Piotr

    AU - Koudas, Nick

    AU - Muthukrishnan, Shanmugavelayutham

    PY - 2000/12/1

    Y1 - 2000/12/1

    N2 - Many data stores, including scientific and financial databases, business warehouses and network repositories, contain time series data. Time series data depict trends for an observed value e.g., value of a stock, number of bytes sent on a router interface, etc., as a function of time. Analysis of the trends over different time windows is of great interest. In this paper, we formalize problems of identifying various 'representative' trends in time series data. Informally, an interval of observations in a time series is defined to be a representative trend if its distance from other intervals satisfy certain properties, for suitably defined distance functions between time series intervals. Natural trends of interest such as periodic or average trends are examples of representative trends. We present efficient algorithms for analyzing massive time series data sets for representative trends over arbitrary windows of interest. Our algorithms are highly processor and 10 efficient; they are approximate but provide probabilistic guarantees for the approximations achieved. Our approach for identifying representative trends relies on a dimensionality reduction technique that replaces each interval by a 'sketch' which is a low dimensional vector. We present efficient algorithms to construct such sketches using a pool of select sketches that we precompute using polynomial convolutions. Using such sketches, we can compute representative trends accurately. Finally, we present results of a detailed experimental study of our technique on very large real data sets. Our results show that, compared to approaches that determine representative trends exactly, our approach shows significant performance gains with only a small loss in accuracy.

    AB - Many data stores, including scientific and financial databases, business warehouses and network repositories, contain time series data. Time series data depict trends for an observed value e.g., value of a stock, number of bytes sent on a router interface, etc., as a function of time. Analysis of the trends over different time windows is of great interest. In this paper, we formalize problems of identifying various 'representative' trends in time series data. Informally, an interval of observations in a time series is defined to be a representative trend if its distance from other intervals satisfy certain properties, for suitably defined distance functions between time series intervals. Natural trends of interest such as periodic or average trends are examples of representative trends. We present efficient algorithms for analyzing massive time series data sets for representative trends over arbitrary windows of interest. Our algorithms are highly processor and 10 efficient; they are approximate but provide probabilistic guarantees for the approximations achieved. Our approach for identifying representative trends relies on a dimensionality reduction technique that replaces each interval by a 'sketch' which is a low dimensional vector. We present efficient algorithms to construct such sketches using a pool of select sketches that we precompute using polynomial convolutions. Using such sketches, we can compute representative trends accurately. Finally, we present results of a detailed experimental study of our technique on very large real data sets. Our results show that, compared to approaches that determine representative trends exactly, our approach shows significant performance gains with only a small loss in accuracy.

    UR - http://www.scopus.com/inward/record.url?scp=32344434787&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=32344434787&partnerID=8YFLogxK

    M3 - Conference contribution

    AN - SCOPUS:32344434787

    SN - 1558607153

    SN - 9781558607156

    T3 - Proceedings of the 26th International Conference on Very Large Data Bases, VLDB'00

    SP - 363

    EP - 372

    BT - Proceedings of the 26th International Conference on Very Large Data Bases, VLDB'00

    ER -