Approximate Tree Matching in the Presence of Variable Length Don′t Cares

K. Z. Zhang, Dennis Shasha, J. T L Wang

Research output: Contribution to journalArticle

Abstract

Ordered labeled trees are trees in which the sibling order matters. This paper presents algorithms for three problems having to do with approximate matching for such trees with variable length don′t cares (VLDCs). In strings, a VLDC symbol in the pattern may substitute for zero or more symbols in the data string. For example, if "com*er" is the pattern, then the "*" would substitute for the substring "put" when matching the data string "computer." Approximate VLDC matching in strings means that after the best possible substitution, the pattern still need not be the same as the data string for a match to be allowed. For example, "com*er" matches "counter" within distance one (representing the cost of removing the "m" from "com*er" and having the "*" substitute for "unt"). We generalize approximate VLDC string matching to three algorithms for approximate VLDC matching on trees. The time complexity of our algorithms is O(|P| × |D| × min(depth(P), leaves(P)) × min(depth(D), leaves(D))) (where |P| and |D| are the number of nodes respectively of the pattern P and the data tree D), the same as for the best approximate tree matching algorithm without VLDCs previously reported.

Original languageEnglish (US)
Pages (from-to)33-66
Number of pages34
JournalJournal of Algorithms
Volume16
Issue number1
DOIs
StatePublished - Jan 1994

Fingerprint

Strings
Substitute
Trees (mathematics)
Leaves
Ordered Trees
Labeled Trees
String Matching
Substitution reactions
Tree Algorithms
Matching Algorithm
Time Complexity
Substitution
Costs
Generalise
Zero
Vertex of a graph

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Approximate Tree Matching in the Presence of Variable Length Don′t Cares. / Zhang, K. Z.; Shasha, Dennis; Wang, J. T L.

In: Journal of Algorithms, Vol. 16, No. 1, 01.1994, p. 33-66.

Research output: Contribution to journalArticle

@article{a03a6d373431449eb3d9c0b5d148b3e3,
title = "Approximate Tree Matching in the Presence of Variable Length Don′t Cares",
abstract = "Ordered labeled trees are trees in which the sibling order matters. This paper presents algorithms for three problems having to do with approximate matching for such trees with variable length don′t cares (VLDCs). In strings, a VLDC symbol in the pattern may substitute for zero or more symbols in the data string. For example, if {"}com*er{"} is the pattern, then the {"}*{"} would substitute for the substring {"}put{"} when matching the data string {"}computer.{"} Approximate VLDC matching in strings means that after the best possible substitution, the pattern still need not be the same as the data string for a match to be allowed. For example, {"}com*er{"} matches {"}counter{"} within distance one (representing the cost of removing the {"}m{"} from {"}com*er{"} and having the {"}*{"} substitute for {"}unt{"}). We generalize approximate VLDC string matching to three algorithms for approximate VLDC matching on trees. The time complexity of our algorithms is O(|P| × |D| × min(depth(P), leaves(P)) × min(depth(D), leaves(D))) (where |P| and |D| are the number of nodes respectively of the pattern P and the data tree D), the same as for the best approximate tree matching algorithm without VLDCs previously reported.",
author = "Zhang, {K. Z.} and Dennis Shasha and Wang, {J. T L}",
year = "1994",
month = "1",
doi = "10.1006/jagm.1994.1003",
language = "English (US)",
volume = "16",
pages = "33--66",
journal = "Journal of Algorithms",
issn = "0196-6774",
publisher = "Academic Press Inc.",
number = "1",

}

TY - JOUR

T1 - Approximate Tree Matching in the Presence of Variable Length Don′t Cares

AU - Zhang, K. Z.

AU - Shasha, Dennis

AU - Wang, J. T L

PY - 1994/1

Y1 - 1994/1

N2 - Ordered labeled trees are trees in which the sibling order matters. This paper presents algorithms for three problems having to do with approximate matching for such trees with variable length don′t cares (VLDCs). In strings, a VLDC symbol in the pattern may substitute for zero or more symbols in the data string. For example, if "com*er" is the pattern, then the "*" would substitute for the substring "put" when matching the data string "computer." Approximate VLDC matching in strings means that after the best possible substitution, the pattern still need not be the same as the data string for a match to be allowed. For example, "com*er" matches "counter" within distance one (representing the cost of removing the "m" from "com*er" and having the "*" substitute for "unt"). We generalize approximate VLDC string matching to three algorithms for approximate VLDC matching on trees. The time complexity of our algorithms is O(|P| × |D| × min(depth(P), leaves(P)) × min(depth(D), leaves(D))) (where |P| and |D| are the number of nodes respectively of the pattern P and the data tree D), the same as for the best approximate tree matching algorithm without VLDCs previously reported.

AB - Ordered labeled trees are trees in which the sibling order matters. This paper presents algorithms for three problems having to do with approximate matching for such trees with variable length don′t cares (VLDCs). In strings, a VLDC symbol in the pattern may substitute for zero or more symbols in the data string. For example, if "com*er" is the pattern, then the "*" would substitute for the substring "put" when matching the data string "computer." Approximate VLDC matching in strings means that after the best possible substitution, the pattern still need not be the same as the data string for a match to be allowed. For example, "com*er" matches "counter" within distance one (representing the cost of removing the "m" from "com*er" and having the "*" substitute for "unt"). We generalize approximate VLDC string matching to three algorithms for approximate VLDC matching on trees. The time complexity of our algorithms is O(|P| × |D| × min(depth(P), leaves(P)) × min(depth(D), leaves(D))) (where |P| and |D| are the number of nodes respectively of the pattern P and the data tree D), the same as for the best approximate tree matching algorithm without VLDCs previously reported.

UR - http://www.scopus.com/inward/record.url?scp=0003193164&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0003193164&partnerID=8YFLogxK

U2 - 10.1006/jagm.1994.1003

DO - 10.1006/jagm.1994.1003

M3 - Article

VL - 16

SP - 33

EP - 66

JO - Journal of Algorithms

JF - Journal of Algorithms

SN - 0196-6774

IS - 1

ER -