Pre-processing and indexing techniques for constellation queries in big data

Amir Khatibi, Fabio Porto, Joao Guilherme Rittmeyer, Eduardo Ogasawara, Patrick Valduriez, Dennis Shasha

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Geometric patterns are defined by a spatial distribution of a set of objects. They can be found in many spatial datasets as in seismic, astronomy, and transportation. A particular interesting geometric pattern is exhibited by the Einstein cross, which is an astronomical phenomenon in which a single quasar is observed as four distinct sky objects when captured by earth telescopes. Finding such crosses, as well as other geometric patterns, collectively refered to as constellation queries, is a challenging problem as the potential number of sets of elements that compose shapes is exponentially large in the size of the dataset and the query pattern. In this paper we propose algorithms to optimize the computation of constellation queries. Our techniques involve pre-processing the query to reduce its dimensionality as well as indexing the data to fasten stars neighboring computation using a PH-tree. We have implemented our techniques in Spark and evaluated our techniques by a series of experiments. The PH-tree indexing showed very good results and guarantees query answer completeness.

Original languageEnglish (US)
Title of host publicationBig Data Analytics and Knowledge Discovery - 19th International Conference, DaWaK 2017, Proceedings
PublisherSpringer Verlag
Pages164-172
Number of pages9
Volume10440 LNCS
ISBN (Print)9783319642826
DOIs
StatePublished - 2017
Event19th International Conference on Big Data Analytics and Knowledge Discovery, DaWaK 2017 - Lyon, France
Duration: Aug 28 2017Aug 31 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10440 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other19th International Conference on Big Data Analytics and Knowledge Discovery, DaWaK 2017
CountryFrance
CityLyon
Period8/28/178/31/17

Fingerprint

Indexing
Preprocessing
Query
Astronomy
Processing
Electric sparks
Telescopes
Spatial distribution
Stars
Earth (planet)
Quasars
Spatial Distribution
Experiments
Albert Einstein
Dimensionality
Telescope
Completeness
Star
Optimise
Big data

Keywords

  • Constellation queries
  • Dataset pre-processing
  • Geometric shapes
  • PH-tree indexing
  • Query pre-processing
  • SQL extension

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Khatibi, A., Porto, F., Rittmeyer, J. G., Ogasawara, E., Valduriez, P., & Shasha, D. (2017). Pre-processing and indexing techniques for constellation queries in big data. In Big Data Analytics and Knowledge Discovery - 19th International Conference, DaWaK 2017, Proceedings (Vol. 10440 LNCS, pp. 164-172). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10440 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-64283-3_12

Pre-processing and indexing techniques for constellation queries in big data. / Khatibi, Amir; Porto, Fabio; Rittmeyer, Joao Guilherme; Ogasawara, Eduardo; Valduriez, Patrick; Shasha, Dennis.

Big Data Analytics and Knowledge Discovery - 19th International Conference, DaWaK 2017, Proceedings. Vol. 10440 LNCS Springer Verlag, 2017. p. 164-172 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10440 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Khatibi, A, Porto, F, Rittmeyer, JG, Ogasawara, E, Valduriez, P & Shasha, D 2017, Pre-processing and indexing techniques for constellation queries in big data. in Big Data Analytics and Knowledge Discovery - 19th International Conference, DaWaK 2017, Proceedings. vol. 10440 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10440 LNCS, Springer Verlag, pp. 164-172, 19th International Conference on Big Data Analytics and Knowledge Discovery, DaWaK 2017, Lyon, France, 8/28/17. https://doi.org/10.1007/978-3-319-64283-3_12
Khatibi A, Porto F, Rittmeyer JG, Ogasawara E, Valduriez P, Shasha D. Pre-processing and indexing techniques for constellation queries in big data. In Big Data Analytics and Knowledge Discovery - 19th International Conference, DaWaK 2017, Proceedings. Vol. 10440 LNCS. Springer Verlag. 2017. p. 164-172. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-64283-3_12
Khatibi, Amir ; Porto, Fabio ; Rittmeyer, Joao Guilherme ; Ogasawara, Eduardo ; Valduriez, Patrick ; Shasha, Dennis. / Pre-processing and indexing techniques for constellation queries in big data. Big Data Analytics and Knowledge Discovery - 19th International Conference, DaWaK 2017, Proceedings. Vol. 10440 LNCS Springer Verlag, 2017. pp. 164-172 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{52ba935628a0467dbe7bc6c1cb13c5fd,
title = "Pre-processing and indexing techniques for constellation queries in big data",
abstract = "Geometric patterns are defined by a spatial distribution of a set of objects. They can be found in many spatial datasets as in seismic, astronomy, and transportation. A particular interesting geometric pattern is exhibited by the Einstein cross, which is an astronomical phenomenon in which a single quasar is observed as four distinct sky objects when captured by earth telescopes. Finding such crosses, as well as other geometric patterns, collectively refered to as constellation queries, is a challenging problem as the potential number of sets of elements that compose shapes is exponentially large in the size of the dataset and the query pattern. In this paper we propose algorithms to optimize the computation of constellation queries. Our techniques involve pre-processing the query to reduce its dimensionality as well as indexing the data to fasten stars neighboring computation using a PH-tree. We have implemented our techniques in Spark and evaluated our techniques by a series of experiments. The PH-tree indexing showed very good results and guarantees query answer completeness.",
keywords = "Constellation queries, Dataset pre-processing, Geometric shapes, PH-tree indexing, Query pre-processing, SQL extension",
author = "Amir Khatibi and Fabio Porto and Rittmeyer, {Joao Guilherme} and Eduardo Ogasawara and Patrick Valduriez and Dennis Shasha",
year = "2017",
doi = "10.1007/978-3-319-64283-3_12",
language = "English (US)",
isbn = "9783319642826",
volume = "10440 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "164--172",
booktitle = "Big Data Analytics and Knowledge Discovery - 19th International Conference, DaWaK 2017, Proceedings",
address = "Germany",

}

TY - GEN

T1 - Pre-processing and indexing techniques for constellation queries in big data

AU - Khatibi, Amir

AU - Porto, Fabio

AU - Rittmeyer, Joao Guilherme

AU - Ogasawara, Eduardo

AU - Valduriez, Patrick

AU - Shasha, Dennis

PY - 2017

Y1 - 2017

N2 - Geometric patterns are defined by a spatial distribution of a set of objects. They can be found in many spatial datasets as in seismic, astronomy, and transportation. A particular interesting geometric pattern is exhibited by the Einstein cross, which is an astronomical phenomenon in which a single quasar is observed as four distinct sky objects when captured by earth telescopes. Finding such crosses, as well as other geometric patterns, collectively refered to as constellation queries, is a challenging problem as the potential number of sets of elements that compose shapes is exponentially large in the size of the dataset and the query pattern. In this paper we propose algorithms to optimize the computation of constellation queries. Our techniques involve pre-processing the query to reduce its dimensionality as well as indexing the data to fasten stars neighboring computation using a PH-tree. We have implemented our techniques in Spark and evaluated our techniques by a series of experiments. The PH-tree indexing showed very good results and guarantees query answer completeness.

AB - Geometric patterns are defined by a spatial distribution of a set of objects. They can be found in many spatial datasets as in seismic, astronomy, and transportation. A particular interesting geometric pattern is exhibited by the Einstein cross, which is an astronomical phenomenon in which a single quasar is observed as four distinct sky objects when captured by earth telescopes. Finding such crosses, as well as other geometric patterns, collectively refered to as constellation queries, is a challenging problem as the potential number of sets of elements that compose shapes is exponentially large in the size of the dataset and the query pattern. In this paper we propose algorithms to optimize the computation of constellation queries. Our techniques involve pre-processing the query to reduce its dimensionality as well as indexing the data to fasten stars neighboring computation using a PH-tree. We have implemented our techniques in Spark and evaluated our techniques by a series of experiments. The PH-tree indexing showed very good results and guarantees query answer completeness.

KW - Constellation queries

KW - Dataset pre-processing

KW - Geometric shapes

KW - PH-tree indexing

KW - Query pre-processing

KW - SQL extension

UR - http://www.scopus.com/inward/record.url?scp=85028471280&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85028471280&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-64283-3_12

DO - 10.1007/978-3-319-64283-3_12

M3 - Conference contribution

AN - SCOPUS:85028471280

SN - 9783319642826

VL - 10440 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 164

EP - 172

BT - Big Data Analytics and Knowledge Discovery - 19th International Conference, DaWaK 2017, Proceedings

PB - Springer Verlag

ER -