ATreeGrep

Approximate searching in unordered trees

Dennis Shasha, J. T L Wang, Huiyuan Shan, Kaizhong Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

An unordered labeled tree is a tree in which each node has a string label and the parent-child relationship is significant, but the order among siblings is unimportant. This paper presents an approach to the nearest neighbor search problem for these trees. Given a database D of unordered labeled trees and a query tree Q, the goal is to find those trees in D that "approximately" contain Q. Our approach is based on storing the paths of the trees in a suffix array and then counting the number of mismatching paths between the query tree and a data tree. To speed up a search, we use a hash-based technique to filter out unqualified data trees at an early stage of the search. Experimental results obtained by running our techniques on phylogenetic trees and synthetic data demonstrate the good performance of the proposed approach. We also discuss the use of our work in XML and scientific database management.

Original languageEnglish (US)
Title of host publicationProceedings - 14th International Conference on Scientific and Statistical Database Management, SSDBM 2002
PublisherIEEE Computer Society
Pages89-98
Number of pages10
Volume2002-January
ISBN (Print)0769516327
DOIs
StatePublished - 2002
Event14th International Conference on Scientific and Statistical Database Management, SSDBM 2002 - Edinburgh, Scotland, United Kingdom
Duration: Jul 24 2002Jul 26 2002

Other

Other14th International Conference on Scientific and Statistical Database Management, SSDBM 2002
CountryUnited Kingdom
CityEdinburgh, Scotland
Period7/24/027/26/02

Fingerprint

XML
Labels
Nearest neighbor search

Keywords

  • Algorithm design and analysis
  • Change detection algorithms
  • Computer science
  • Decision trees
  • Filters
  • Nearest neighbor searches
  • Object oriented databases
  • Object oriented modeling
  • Phylogeny
  • XML

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Shasha, D., Wang, J. T. L., Shan, H., & Zhang, K. (2002). ATreeGrep: Approximate searching in unordered trees. In Proceedings - 14th International Conference on Scientific and Statistical Database Management, SSDBM 2002 (Vol. 2002-January, pp. 89-98). [1029709] IEEE Computer Society. https://doi.org/10.1109/SSDM.2002.1029709

ATreeGrep : Approximate searching in unordered trees. / Shasha, Dennis; Wang, J. T L; Shan, Huiyuan; Zhang, Kaizhong.

Proceedings - 14th International Conference on Scientific and Statistical Database Management, SSDBM 2002. Vol. 2002-January IEEE Computer Society, 2002. p. 89-98 1029709.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Shasha, D, Wang, JTL, Shan, H & Zhang, K 2002, ATreeGrep: Approximate searching in unordered trees. in Proceedings - 14th International Conference on Scientific and Statistical Database Management, SSDBM 2002. vol. 2002-January, 1029709, IEEE Computer Society, pp. 89-98, 14th International Conference on Scientific and Statistical Database Management, SSDBM 2002, Edinburgh, Scotland, United Kingdom, 7/24/02. https://doi.org/10.1109/SSDM.2002.1029709
Shasha D, Wang JTL, Shan H, Zhang K. ATreeGrep: Approximate searching in unordered trees. In Proceedings - 14th International Conference on Scientific and Statistical Database Management, SSDBM 2002. Vol. 2002-January. IEEE Computer Society. 2002. p. 89-98. 1029709 https://doi.org/10.1109/SSDM.2002.1029709
Shasha, Dennis ; Wang, J. T L ; Shan, Huiyuan ; Zhang, Kaizhong. / ATreeGrep : Approximate searching in unordered trees. Proceedings - 14th International Conference on Scientific and Statistical Database Management, SSDBM 2002. Vol. 2002-January IEEE Computer Society, 2002. pp. 89-98
@inproceedings{aeccdb5a99a347a8a80f7af3b2210e4b,
title = "ATreeGrep: Approximate searching in unordered trees",
abstract = "An unordered labeled tree is a tree in which each node has a string label and the parent-child relationship is significant, but the order among siblings is unimportant. This paper presents an approach to the nearest neighbor search problem for these trees. Given a database D of unordered labeled trees and a query tree Q, the goal is to find those trees in D that {"}approximately{"} contain Q. Our approach is based on storing the paths of the trees in a suffix array and then counting the number of mismatching paths between the query tree and a data tree. To speed up a search, we use a hash-based technique to filter out unqualified data trees at an early stage of the search. Experimental results obtained by running our techniques on phylogenetic trees and synthetic data demonstrate the good performance of the proposed approach. We also discuss the use of our work in XML and scientific database management.",
keywords = "Algorithm design and analysis, Change detection algorithms, Computer science, Decision trees, Filters, Nearest neighbor searches, Object oriented databases, Object oriented modeling, Phylogeny, XML",
author = "Dennis Shasha and Wang, {J. T L} and Huiyuan Shan and Kaizhong Zhang",
year = "2002",
doi = "10.1109/SSDM.2002.1029709",
language = "English (US)",
isbn = "0769516327",
volume = "2002-January",
pages = "89--98",
booktitle = "Proceedings - 14th International Conference on Scientific and Statistical Database Management, SSDBM 2002",
publisher = "IEEE Computer Society",

}

TY - GEN

T1 - ATreeGrep

T2 - Approximate searching in unordered trees

AU - Shasha, Dennis

AU - Wang, J. T L

AU - Shan, Huiyuan

AU - Zhang, Kaizhong

PY - 2002

Y1 - 2002

N2 - An unordered labeled tree is a tree in which each node has a string label and the parent-child relationship is significant, but the order among siblings is unimportant. This paper presents an approach to the nearest neighbor search problem for these trees. Given a database D of unordered labeled trees and a query tree Q, the goal is to find those trees in D that "approximately" contain Q. Our approach is based on storing the paths of the trees in a suffix array and then counting the number of mismatching paths between the query tree and a data tree. To speed up a search, we use a hash-based technique to filter out unqualified data trees at an early stage of the search. Experimental results obtained by running our techniques on phylogenetic trees and synthetic data demonstrate the good performance of the proposed approach. We also discuss the use of our work in XML and scientific database management.

AB - An unordered labeled tree is a tree in which each node has a string label and the parent-child relationship is significant, but the order among siblings is unimportant. This paper presents an approach to the nearest neighbor search problem for these trees. Given a database D of unordered labeled trees and a query tree Q, the goal is to find those trees in D that "approximately" contain Q. Our approach is based on storing the paths of the trees in a suffix array and then counting the number of mismatching paths between the query tree and a data tree. To speed up a search, we use a hash-based technique to filter out unqualified data trees at an early stage of the search. Experimental results obtained by running our techniques on phylogenetic trees and synthetic data demonstrate the good performance of the proposed approach. We also discuss the use of our work in XML and scientific database management.

KW - Algorithm design and analysis

KW - Change detection algorithms

KW - Computer science

KW - Decision trees

KW - Filters

KW - Nearest neighbor searches

KW - Object oriented databases

KW - Object oriented modeling

KW - Phylogeny

KW - XML

UR - http://www.scopus.com/inward/record.url?scp=84948667724&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84948667724&partnerID=8YFLogxK

U2 - 10.1109/SSDM.2002.1029709

DO - 10.1109/SSDM.2002.1029709

M3 - Conference contribution

SN - 0769516327

VL - 2002-January

SP - 89

EP - 98

BT - Proceedings - 14th International Conference on Scientific and Statistical Database Management, SSDBM 2002

PB - IEEE Computer Society

ER -