### Abstract

Data arriving in time order (a data stream) arises in fields including physics, finance, medicine, and music, to name a few. Often the data comes from sensors (in physics and medicine for example) whose data rates continue to improve dramatically as sensor technology improves. Further, the number of sensors is increasing, so correlating data between sensors becomes ever more critical in order to distill knowlege from the data. In many applications such as finance, recent correlations are of far more interest than long-term correlation, so correlation over sliding windows (windowed correlation) is the desired operation. Fast response is desirable in many applications (e.g., to aim a telescope at an activity of interest or to perform a stock trade). These three factors - data size, windowed correlation, and fast response - motivate this work. Previous work [10, 14] showed how to compute Pearson correlation using Fast Fourier Transforms and Wavelet transforms, but such techniques don't work for time series in which the energy is spread over many frequency components, thus resembling white noise. For such "uncooperative" time series, this paper shows how to combine several simple techniques - sketches (random projections), convolution, structured random vectors, grid structures, and combinatorial design - to achieve high performance windowed Pearson correlation over a variety of data sets.

Original language | English (US) |
---|---|

Title of host publication | KDD-2005 - Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |

Editors | R.L. Grossman, R. Bayardo, K. Bennett, J. Vaidya |

Pages | 743-749 |

Number of pages | 7 |

DOIs | |

State | Published - 2005 |

Event | KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Chicago, IL, United States Duration: Aug 21 2005 → Aug 24 2005 |

### Other

Other | KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |
---|---|

Country | United States |

City | Chicago, IL |

Period | 8/21/05 → 8/24/05 |

### Fingerprint

### Keywords

- Correlation
- Randomized algorithms
- Time series

### ASJC Scopus subject areas

- Information Systems

### Cite this

*KDD-2005 - Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*(pp. 743-749) https://doi.org/10.1145/1081870.1081966

**Fast window correlations over uncooperative time series.** / Cole, Richard; Shasha, Dennis; Zhao, Xiaojian.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*KDD-2005 - Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.*pp. 743-749, KDD-2005: 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, United States, 8/21/05. https://doi.org/10.1145/1081870.1081966

}

TY - GEN

T1 - Fast window correlations over uncooperative time series

AU - Cole, Richard

AU - Shasha, Dennis

AU - Zhao, Xiaojian

PY - 2005

Y1 - 2005

N2 - Data arriving in time order (a data stream) arises in fields including physics, finance, medicine, and music, to name a few. Often the data comes from sensors (in physics and medicine for example) whose data rates continue to improve dramatically as sensor technology improves. Further, the number of sensors is increasing, so correlating data between sensors becomes ever more critical in order to distill knowlege from the data. In many applications such as finance, recent correlations are of far more interest than long-term correlation, so correlation over sliding windows (windowed correlation) is the desired operation. Fast response is desirable in many applications (e.g., to aim a telescope at an activity of interest or to perform a stock trade). These three factors - data size, windowed correlation, and fast response - motivate this work. Previous work [10, 14] showed how to compute Pearson correlation using Fast Fourier Transforms and Wavelet transforms, but such techniques don't work for time series in which the energy is spread over many frequency components, thus resembling white noise. For such "uncooperative" time series, this paper shows how to combine several simple techniques - sketches (random projections), convolution, structured random vectors, grid structures, and combinatorial design - to achieve high performance windowed Pearson correlation over a variety of data sets.

AB - Data arriving in time order (a data stream) arises in fields including physics, finance, medicine, and music, to name a few. Often the data comes from sensors (in physics and medicine for example) whose data rates continue to improve dramatically as sensor technology improves. Further, the number of sensors is increasing, so correlating data between sensors becomes ever more critical in order to distill knowlege from the data. In many applications such as finance, recent correlations are of far more interest than long-term correlation, so correlation over sliding windows (windowed correlation) is the desired operation. Fast response is desirable in many applications (e.g., to aim a telescope at an activity of interest or to perform a stock trade). These three factors - data size, windowed correlation, and fast response - motivate this work. Previous work [10, 14] showed how to compute Pearson correlation using Fast Fourier Transforms and Wavelet transforms, but such techniques don't work for time series in which the energy is spread over many frequency components, thus resembling white noise. For such "uncooperative" time series, this paper shows how to combine several simple techniques - sketches (random projections), convolution, structured random vectors, grid structures, and combinatorial design - to achieve high performance windowed Pearson correlation over a variety of data sets.

KW - Correlation

KW - Randomized algorithms

KW - Time series

UR - http://www.scopus.com/inward/record.url?scp=32344446365&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=32344446365&partnerID=8YFLogxK

U2 - 10.1145/1081870.1081966

DO - 10.1145/1081870.1081966

M3 - Conference contribution

SP - 743

EP - 749

BT - KDD-2005 - Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

A2 - Grossman, R.L.

A2 - Bayardo, R.

A2 - Bennett, K.

A2 - Vaidya, J.

ER -