ParCorr: efficient parallel methods to identify similar time series pairs across sliding windows

Djamel Edine Yagoubi, Reza Akbarinia, Boyan Kolev, Oleksandra Levchenko, Florent Masseglia, Patrick Valduriez, Dennis Shasha

Research output: Contribution to journalArticle

Abstract

Consider the problem of finding the highly correlated pairs of time series over a time window and then sliding that window to find the highly correlated pairs over successive co-temporous windows such that each successive window starts only a little time after the previous window. Doing this efficiently and in parallel could help in applications such as sensor fusion, financial trading, or communications network monitoring, to name a few. We have developed a parallel incremental random vector/sketching approach to this problem and compared it with the state-of-the-art nearest neighbor method iSAX. Whereas iSAX achieves 100% recall and precision for Euclidean distance, the sketching approach is, empirically, at least 10 times faster and achieves 95% recall and 100% precision on real and simulated data. For many applications this speedup is worth the minor reduction in recall. Our method scales up to 100 million time series and scales linearly in its expensive steps (but quadratic in the less expensive ones).

Original languageEnglish (US)
Pages (from-to)1481-1507
Number of pages27
JournalData Mining and Knowledge Discovery
Volume32
Issue number5
DOIs
StatePublished - Sep 1 2018

Fingerprint

Time series
Telecommunication networks
Fusion reactions
Monitoring
Sensors

Keywords

  • Data mining
  • Data stream processing
  • Distributed computing
  • Time series analysis

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computer Networks and Communications

Cite this

ParCorr : efficient parallel methods to identify similar time series pairs across sliding windows. / Yagoubi, Djamel Edine; Akbarinia, Reza; Kolev, Boyan; Levchenko, Oleksandra; Masseglia, Florent; Valduriez, Patrick; Shasha, Dennis.

In: Data Mining and Knowledge Discovery, Vol. 32, No. 5, 01.09.2018, p. 1481-1507.

Research output: Contribution to journalArticle

Yagoubi, Djamel Edine ; Akbarinia, Reza ; Kolev, Boyan ; Levchenko, Oleksandra ; Masseglia, Florent ; Valduriez, Patrick ; Shasha, Dennis. / ParCorr : efficient parallel methods to identify similar time series pairs across sliding windows. In: Data Mining and Knowledge Discovery. 2018 ; Vol. 32, No. 5. pp. 1481-1507.
@article{d50f40e0c5cc4f85bb445e8285aba30e,
title = "ParCorr: efficient parallel methods to identify similar time series pairs across sliding windows",
abstract = "Consider the problem of finding the highly correlated pairs of time series over a time window and then sliding that window to find the highly correlated pairs over successive co-temporous windows such that each successive window starts only a little time after the previous window. Doing this efficiently and in parallel could help in applications such as sensor fusion, financial trading, or communications network monitoring, to name a few. We have developed a parallel incremental random vector/sketching approach to this problem and compared it with the state-of-the-art nearest neighbor method iSAX. Whereas iSAX achieves 100{\%} recall and precision for Euclidean distance, the sketching approach is, empirically, at least 10 times faster and achieves 95{\%} recall and 100{\%} precision on real and simulated data. For many applications this speedup is worth the minor reduction in recall. Our method scales up to 100 million time series and scales linearly in its expensive steps (but quadratic in the less expensive ones).",
keywords = "Data mining, Data stream processing, Distributed computing, Time series analysis",
author = "Yagoubi, {Djamel Edine} and Reza Akbarinia and Boyan Kolev and Oleksandra Levchenko and Florent Masseglia and Patrick Valduriez and Dennis Shasha",
year = "2018",
month = "9",
day = "1",
doi = "10.1007/s10618-018-0580-z",
language = "English (US)",
volume = "32",
pages = "1481--1507",
journal = "Data Mining and Knowledge Discovery",
issn = "1384-5810",
publisher = "Springer Netherlands",
number = "5",

}

TY - JOUR

T1 - ParCorr

T2 - efficient parallel methods to identify similar time series pairs across sliding windows

AU - Yagoubi, Djamel Edine

AU - Akbarinia, Reza

AU - Kolev, Boyan

AU - Levchenko, Oleksandra

AU - Masseglia, Florent

AU - Valduriez, Patrick

AU - Shasha, Dennis

PY - 2018/9/1

Y1 - 2018/9/1

N2 - Consider the problem of finding the highly correlated pairs of time series over a time window and then sliding that window to find the highly correlated pairs over successive co-temporous windows such that each successive window starts only a little time after the previous window. Doing this efficiently and in parallel could help in applications such as sensor fusion, financial trading, or communications network monitoring, to name a few. We have developed a parallel incremental random vector/sketching approach to this problem and compared it with the state-of-the-art nearest neighbor method iSAX. Whereas iSAX achieves 100% recall and precision for Euclidean distance, the sketching approach is, empirically, at least 10 times faster and achieves 95% recall and 100% precision on real and simulated data. For many applications this speedup is worth the minor reduction in recall. Our method scales up to 100 million time series and scales linearly in its expensive steps (but quadratic in the less expensive ones).

AB - Consider the problem of finding the highly correlated pairs of time series over a time window and then sliding that window to find the highly correlated pairs over successive co-temporous windows such that each successive window starts only a little time after the previous window. Doing this efficiently and in parallel could help in applications such as sensor fusion, financial trading, or communications network monitoring, to name a few. We have developed a parallel incremental random vector/sketching approach to this problem and compared it with the state-of-the-art nearest neighbor method iSAX. Whereas iSAX achieves 100% recall and precision for Euclidean distance, the sketching approach is, empirically, at least 10 times faster and achieves 95% recall and 100% precision on real and simulated data. For many applications this speedup is worth the minor reduction in recall. Our method scales up to 100 million time series and scales linearly in its expensive steps (but quadratic in the less expensive ones).

KW - Data mining

KW - Data stream processing

KW - Distributed computing

KW - Time series analysis

UR - http://www.scopus.com/inward/record.url?scp=85051745634&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051745634&partnerID=8YFLogxK

U2 - 10.1007/s10618-018-0580-z

DO - 10.1007/s10618-018-0580-z

M3 - Article

AN - SCOPUS:85051745634

VL - 32

SP - 1481

EP - 1507

JO - Data Mining and Knowledge Discovery

JF - Data Mining and Knowledge Discovery

SN - 1384-5810

IS - 5

ER -