RadiusSketch: Massively distributed indexing of time series

Djamel Edine Yagoubi, Reza Akbarinia, Florent Masseglia, Dennis Shasha

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Performing similarity queries on hundreds of millions of time series is a challenge requiring both efficient indexing techniques and parallelization. We propose a sketch/random projection-based approach that scales nearly linearly in parallel environments, and provides high quality answers. We illustrate the performance of our approach, called RadiusSketch, on real and synthetic datasets of up to 1 Terabytes and 500 million time series. The sketch method, as we have implemented, is superior in both quality and response time compared with the state of the art approach, iSAX2+. Already, in the sequential case it improves recall and precision by a factor of two, while giving shorter response times. In a parallel environment with 32 processors, on both real and synthetic data, our parallel approach improves by a factor of up to 100 in index time construction and up to 15 in query answering time. Finally, our data structure makes use of idle computing time to improve the recall and precision yet further.

Original languageEnglish (US)
Title of host publicationProceedings - 2017 International Conference on Data Science and Advanced Analytics, DSAA 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages262-271
Number of pages10
Volume2018-January
ISBN (Electronic)9781509050048
DOIs
StatePublished - Jan 16 2018
Event4th International Conference on Data Science and Advanced Analytics, DSAA 2017 - Tokyo, Japan
Duration: Oct 19 2017Oct 21 2017

Other

Other4th International Conference on Data Science and Advanced Analytics, DSAA 2017
CountryJapan
CityTokyo
Period10/19/1710/21/17

Fingerprint

Indexing
Time series
Response Time
Query
Random Projection
Data structures
Synthetic Data
Parallelization
Data Structures
Linearly
Computing
Response time
Factors
Similarity

ASJC Scopus subject areas

  • Signal Processing
  • Information Systems and Management
  • Statistics, Probability and Uncertainty
  • Computer Networks and Communications

Cite this

Yagoubi, D. E., Akbarinia, R., Masseglia, F., & Shasha, D. (2018). RadiusSketch: Massively distributed indexing of time series. In Proceedings - 2017 International Conference on Data Science and Advanced Analytics, DSAA 2017 (Vol. 2018-January, pp. 262-271). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/DSAA.2017.49

RadiusSketch : Massively distributed indexing of time series. / Yagoubi, Djamel Edine; Akbarinia, Reza; Masseglia, Florent; Shasha, Dennis.

Proceedings - 2017 International Conference on Data Science and Advanced Analytics, DSAA 2017. Vol. 2018-January Institute of Electrical and Electronics Engineers Inc., 2018. p. 262-271.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yagoubi, DE, Akbarinia, R, Masseglia, F & Shasha, D 2018, RadiusSketch: Massively distributed indexing of time series. in Proceedings - 2017 International Conference on Data Science and Advanced Analytics, DSAA 2017. vol. 2018-January, Institute of Electrical and Electronics Engineers Inc., pp. 262-271, 4th International Conference on Data Science and Advanced Analytics, DSAA 2017, Tokyo, Japan, 10/19/17. https://doi.org/10.1109/DSAA.2017.49
Yagoubi DE, Akbarinia R, Masseglia F, Shasha D. RadiusSketch: Massively distributed indexing of time series. In Proceedings - 2017 International Conference on Data Science and Advanced Analytics, DSAA 2017. Vol. 2018-January. Institute of Electrical and Electronics Engineers Inc. 2018. p. 262-271 https://doi.org/10.1109/DSAA.2017.49
Yagoubi, Djamel Edine ; Akbarinia, Reza ; Masseglia, Florent ; Shasha, Dennis. / RadiusSketch : Massively distributed indexing of time series. Proceedings - 2017 International Conference on Data Science and Advanced Analytics, DSAA 2017. Vol. 2018-January Institute of Electrical and Electronics Engineers Inc., 2018. pp. 262-271
@inproceedings{5a17deb8da714fbb840a1ada7e320f2d,
title = "RadiusSketch: Massively distributed indexing of time series",
abstract = "Performing similarity queries on hundreds of millions of time series is a challenge requiring both efficient indexing techniques and parallelization. We propose a sketch/random projection-based approach that scales nearly linearly in parallel environments, and provides high quality answers. We illustrate the performance of our approach, called RadiusSketch, on real and synthetic datasets of up to 1 Terabytes and 500 million time series. The sketch method, as we have implemented, is superior in both quality and response time compared with the state of the art approach, iSAX2+. Already, in the sequential case it improves recall and precision by a factor of two, while giving shorter response times. In a parallel environment with 32 processors, on both real and synthetic data, our parallel approach improves by a factor of up to 100 in index time construction and up to 15 in query answering time. Finally, our data structure makes use of idle computing time to improve the recall and precision yet further.",
author = "Yagoubi, {Djamel Edine} and Reza Akbarinia and Florent Masseglia and Dennis Shasha",
year = "2018",
month = "1",
day = "16",
doi = "10.1109/DSAA.2017.49",
language = "English (US)",
volume = "2018-January",
pages = "262--271",
booktitle = "Proceedings - 2017 International Conference on Data Science and Advanced Analytics, DSAA 2017",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - RadiusSketch

T2 - Massively distributed indexing of time series

AU - Yagoubi, Djamel Edine

AU - Akbarinia, Reza

AU - Masseglia, Florent

AU - Shasha, Dennis

PY - 2018/1/16

Y1 - 2018/1/16

N2 - Performing similarity queries on hundreds of millions of time series is a challenge requiring both efficient indexing techniques and parallelization. We propose a sketch/random projection-based approach that scales nearly linearly in parallel environments, and provides high quality answers. We illustrate the performance of our approach, called RadiusSketch, on real and synthetic datasets of up to 1 Terabytes and 500 million time series. The sketch method, as we have implemented, is superior in both quality and response time compared with the state of the art approach, iSAX2+. Already, in the sequential case it improves recall and precision by a factor of two, while giving shorter response times. In a parallel environment with 32 processors, on both real and synthetic data, our parallel approach improves by a factor of up to 100 in index time construction and up to 15 in query answering time. Finally, our data structure makes use of idle computing time to improve the recall and precision yet further.

AB - Performing similarity queries on hundreds of millions of time series is a challenge requiring both efficient indexing techniques and parallelization. We propose a sketch/random projection-based approach that scales nearly linearly in parallel environments, and provides high quality answers. We illustrate the performance of our approach, called RadiusSketch, on real and synthetic datasets of up to 1 Terabytes and 500 million time series. The sketch method, as we have implemented, is superior in both quality and response time compared with the state of the art approach, iSAX2+. Already, in the sequential case it improves recall and precision by a factor of two, while giving shorter response times. In a parallel environment with 32 processors, on both real and synthetic data, our parallel approach improves by a factor of up to 100 in index time construction and up to 15 in query answering time. Finally, our data structure makes use of idle computing time to improve the recall and precision yet further.

UR - http://www.scopus.com/inward/record.url?scp=85046285778&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85046285778&partnerID=8YFLogxK

U2 - 10.1109/DSAA.2017.49

DO - 10.1109/DSAA.2017.49

M3 - Conference contribution

AN - SCOPUS:85046285778

VL - 2018-January

SP - 262

EP - 271

BT - Proceedings - 2017 International Conference on Data Science and Advanced Analytics, DSAA 2017

PB - Institute of Electrical and Electronics Engineers Inc.

ER -