Unordered tree mining with applications to phylogeny

Dennis Shasha, Jason T L Wangt, Sen Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Frequent structure mining (FSM) aims to discover and extract patterns frequently occurring in structural data, such as trees and graphs. FSM finds many applications in bioinformatics, XML processing, Web log analysis, and so on. In this paper we present a new FSM technique for finding patterns in rooted unordered labeled trees. The patterns of interest are cousin pairs in these trees. A cousin pair is a pair of nodes sharing the same parent, the same grand-parent, or the same great-grandparent, etc. Given a tree T, our algorithm finds all interesting cousin pairs of T in O(|T|2) time where |T| is the number of nodes in T. Experimental results on synthetic data and phytogenies show the scalability and effectiveness of the proposed technique. To demonstrate the usefulness of our approach, we discuss its applications to locating co-occurring patterns in multiple evolutionary trees, evaluating the consensus of equally parsimonious trees, and finding kernel trees of groups of phylogenies. We also describe extensions of our algorithms for undirected acyclic graphs (or free trees).

Original languageEnglish (US)
Title of host publicationProceedings - 20th International Conference on Data Engineering - ICDE 2004
Pages708-719
Number of pages12
Volume20
DOIs
StatePublished - 2004
EventProceedings - 20th International Conference on Data Engineering - ICDE 2004 - Boston, MA., United States
Duration: Mar 30 2004Apr 2 2004

Other

OtherProceedings - 20th International Conference on Data Engineering - ICDE 2004
CountryUnited States
CityBoston, MA.
Period3/30/044/2/04

Fingerprint

Bioinformatics
XML
World Wide Web
Scalability
Processing
Phylogeny

ASJC Scopus subject areas

  • Software
  • Engineering(all)
  • Engineering (miscellaneous)

Cite this

Shasha, D., Wangt, J. T. L., & Zhang, S. (2004). Unordered tree mining with applications to phylogeny. In Proceedings - 20th International Conference on Data Engineering - ICDE 2004 (Vol. 20, pp. 708-719) https://doi.org/10.1109/ICDE.2004.1320039

Unordered tree mining with applications to phylogeny. / Shasha, Dennis; Wangt, Jason T L; Zhang, Sen.

Proceedings - 20th International Conference on Data Engineering - ICDE 2004. Vol. 20 2004. p. 708-719.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Shasha, D, Wangt, JTL & Zhang, S 2004, Unordered tree mining with applications to phylogeny. in Proceedings - 20th International Conference on Data Engineering - ICDE 2004. vol. 20, pp. 708-719, Proceedings - 20th International Conference on Data Engineering - ICDE 2004, Boston, MA., United States, 3/30/04. https://doi.org/10.1109/ICDE.2004.1320039
Shasha D, Wangt JTL, Zhang S. Unordered tree mining with applications to phylogeny. In Proceedings - 20th International Conference on Data Engineering - ICDE 2004. Vol. 20. 2004. p. 708-719 https://doi.org/10.1109/ICDE.2004.1320039
Shasha, Dennis ; Wangt, Jason T L ; Zhang, Sen. / Unordered tree mining with applications to phylogeny. Proceedings - 20th International Conference on Data Engineering - ICDE 2004. Vol. 20 2004. pp. 708-719
@inproceedings{d22cb4f4aa484c5ba493533a81ede08f,
title = "Unordered tree mining with applications to phylogeny",
abstract = "Frequent structure mining (FSM) aims to discover and extract patterns frequently occurring in structural data, such as trees and graphs. FSM finds many applications in bioinformatics, XML processing, Web log analysis, and so on. In this paper we present a new FSM technique for finding patterns in rooted unordered labeled trees. The patterns of interest are cousin pairs in these trees. A cousin pair is a pair of nodes sharing the same parent, the same grand-parent, or the same great-grandparent, etc. Given a tree T, our algorithm finds all interesting cousin pairs of T in O(|T|2) time where |T| is the number of nodes in T. Experimental results on synthetic data and phytogenies show the scalability and effectiveness of the proposed technique. To demonstrate the usefulness of our approach, we discuss its applications to locating co-occurring patterns in multiple evolutionary trees, evaluating the consensus of equally parsimonious trees, and finding kernel trees of groups of phylogenies. We also describe extensions of our algorithms for undirected acyclic graphs (or free trees).",
author = "Dennis Shasha and Wangt, {Jason T L} and Sen Zhang",
year = "2004",
doi = "10.1109/ICDE.2004.1320039",
language = "English (US)",
volume = "20",
pages = "708--719",
booktitle = "Proceedings - 20th International Conference on Data Engineering - ICDE 2004",

}

TY - GEN

T1 - Unordered tree mining with applications to phylogeny

AU - Shasha, Dennis

AU - Wangt, Jason T L

AU - Zhang, Sen

PY - 2004

Y1 - 2004

N2 - Frequent structure mining (FSM) aims to discover and extract patterns frequently occurring in structural data, such as trees and graphs. FSM finds many applications in bioinformatics, XML processing, Web log analysis, and so on. In this paper we present a new FSM technique for finding patterns in rooted unordered labeled trees. The patterns of interest are cousin pairs in these trees. A cousin pair is a pair of nodes sharing the same parent, the same grand-parent, or the same great-grandparent, etc. Given a tree T, our algorithm finds all interesting cousin pairs of T in O(|T|2) time where |T| is the number of nodes in T. Experimental results on synthetic data and phytogenies show the scalability and effectiveness of the proposed technique. To demonstrate the usefulness of our approach, we discuss its applications to locating co-occurring patterns in multiple evolutionary trees, evaluating the consensus of equally parsimonious trees, and finding kernel trees of groups of phylogenies. We also describe extensions of our algorithms for undirected acyclic graphs (or free trees).

AB - Frequent structure mining (FSM) aims to discover and extract patterns frequently occurring in structural data, such as trees and graphs. FSM finds many applications in bioinformatics, XML processing, Web log analysis, and so on. In this paper we present a new FSM technique for finding patterns in rooted unordered labeled trees. The patterns of interest are cousin pairs in these trees. A cousin pair is a pair of nodes sharing the same parent, the same grand-parent, or the same great-grandparent, etc. Given a tree T, our algorithm finds all interesting cousin pairs of T in O(|T|2) time where |T| is the number of nodes in T. Experimental results on synthetic data and phytogenies show the scalability and effectiveness of the proposed technique. To demonstrate the usefulness of our approach, we discuss its applications to locating co-occurring patterns in multiple evolutionary trees, evaluating the consensus of equally parsimonious trees, and finding kernel trees of groups of phylogenies. We also describe extensions of our algorithms for undirected acyclic graphs (or free trees).

UR - http://www.scopus.com/inward/record.url?scp=2442574772&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=2442574772&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2004.1320039

DO - 10.1109/ICDE.2004.1320039

M3 - Conference contribution

AN - SCOPUS:2442574772

VL - 20

SP - 708

EP - 719

BT - Proceedings - 20th International Conference on Data Engineering - ICDE 2004

ER -