### Abstract

Many data stores, including scientific and financial databases, business warehouses and network repositories, contain time series data. Time series data depict trends for an observed value e.g., value of a stock, number of bytes sent on a router interface, etc., as a function of time. Analysis of the trends over different time windows is of great interest. In this paper, we formalize problems of identifying various 'representative' trends in time series data. Informally, an interval of observations in a time series is defined to be a representative trend if its distance from other intervals satisfy certain properties, for suitably defined distance functions between time series intervals. Natural trends of interest such as periodic or average trends are examples of representative trends. We present efficient algorithms for analyzing massive time series data sets for representative trends over arbitrary windows of interest. Our algorithms are highly processor and 10 efficient; they are approximate but provide probabilistic guarantees for the approximations achieved. Our approach for identifying representative trends relies on a dimensionality reduction technique that replaces each interval by a 'sketch' which is a low dimensional vector. We present efficient algorithms to construct such sketches using a pool of select sketches that we precompute using polynomial convolutions. Using such sketches, we can compute representative trends accurately. Finally, we present results of a detailed experimental study of our technique on very large real data sets. Our results show that, compared to approaches that determine representative trends exactly, our approach shows significant performance gains with only a small loss in accuracy.

Original language | English (US) |
---|---|

Title of host publication | Proceedings of the 26th International Conference on Very Large Data Bases, VLDB'00 |

Pages | 363-372 |

Number of pages | 10 |

State | Published - Dec 1 2000 |

Event | 26th International Conference on Very Large Data Bases, VLDB 2000 - Cairo, Egypt Duration: Sep 10 2000 → Sep 14 2000 |

### Publication series

Name | Proceedings of the 26th International Conference on Very Large Data Bases, VLDB'00 |
---|

### Other

Other | 26th International Conference on Very Large Data Bases, VLDB 2000 |
---|---|

Country | Egypt |

City | Cairo |

Period | 9/10/00 → 9/14/00 |

### Fingerprint

### ASJC Scopus subject areas

- Hardware and Architecture
- Information Systems
- Software
- Information Systems and Management

### Cite this

*Proceedings of the 26th International Conference on Very Large Data Bases, VLDB'00*(pp. 363-372). (Proceedings of the 26th International Conference on Very Large Data Bases, VLDB'00).

**Identifying representative trends in massive time series data sets using sketches.** / Indyk, Piotr; Koudas, Nick; Muthukrishnan, Shanmugavelayutham.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*Proceedings of the 26th International Conference on Very Large Data Bases, VLDB'00.*Proceedings of the 26th International Conference on Very Large Data Bases, VLDB'00, pp. 363-372, 26th International Conference on Very Large Data Bases, VLDB 2000, Cairo, Egypt, 9/10/00.

}

TY - GEN

T1 - Identifying representative trends in massive time series data sets using sketches

AU - Indyk, Piotr

AU - Koudas, Nick

AU - Muthukrishnan, Shanmugavelayutham

PY - 2000/12/1

Y1 - 2000/12/1

N2 - Many data stores, including scientific and financial databases, business warehouses and network repositories, contain time series data. Time series data depict trends for an observed value e.g., value of a stock, number of bytes sent on a router interface, etc., as a function of time. Analysis of the trends over different time windows is of great interest. In this paper, we formalize problems of identifying various 'representative' trends in time series data. Informally, an interval of observations in a time series is defined to be a representative trend if its distance from other intervals satisfy certain properties, for suitably defined distance functions between time series intervals. Natural trends of interest such as periodic or average trends are examples of representative trends. We present efficient algorithms for analyzing massive time series data sets for representative trends over arbitrary windows of interest. Our algorithms are highly processor and 10 efficient; they are approximate but provide probabilistic guarantees for the approximations achieved. Our approach for identifying representative trends relies on a dimensionality reduction technique that replaces each interval by a 'sketch' which is a low dimensional vector. We present efficient algorithms to construct such sketches using a pool of select sketches that we precompute using polynomial convolutions. Using such sketches, we can compute representative trends accurately. Finally, we present results of a detailed experimental study of our technique on very large real data sets. Our results show that, compared to approaches that determine representative trends exactly, our approach shows significant performance gains with only a small loss in accuracy.

AB - Many data stores, including scientific and financial databases, business warehouses and network repositories, contain time series data. Time series data depict trends for an observed value e.g., value of a stock, number of bytes sent on a router interface, etc., as a function of time. Analysis of the trends over different time windows is of great interest. In this paper, we formalize problems of identifying various 'representative' trends in time series data. Informally, an interval of observations in a time series is defined to be a representative trend if its distance from other intervals satisfy certain properties, for suitably defined distance functions between time series intervals. Natural trends of interest such as periodic or average trends are examples of representative trends. We present efficient algorithms for analyzing massive time series data sets for representative trends over arbitrary windows of interest. Our algorithms are highly processor and 10 efficient; they are approximate but provide probabilistic guarantees for the approximations achieved. Our approach for identifying representative trends relies on a dimensionality reduction technique that replaces each interval by a 'sketch' which is a low dimensional vector. We present efficient algorithms to construct such sketches using a pool of select sketches that we precompute using polynomial convolutions. Using such sketches, we can compute representative trends accurately. Finally, we present results of a detailed experimental study of our technique on very large real data sets. Our results show that, compared to approaches that determine representative trends exactly, our approach shows significant performance gains with only a small loss in accuracy.

UR - http://www.scopus.com/inward/record.url?scp=32344434787&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=32344434787&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:32344434787

SN - 1558607153

SN - 9781558607156

T3 - Proceedings of the 26th International Conference on Very Large Data Bases, VLDB'00

SP - 363

EP - 372

BT - Proceedings of the 26th International Conference on Very Large Data Bases, VLDB'00

ER -