Finding patterns in three-dimensional graphs: Algorithms and applications to scientific data mining

Xiong Wang, Jason T L Wang, Dennis Shasha, Bruce A. Shapiro, Isidore Rigoutsos, Kaizhong Zhang

Research output: Contribution to journalArticle

Abstract

This paper presents a method for finding patterns in 3D graphs. Each node in a graph is an undecomposable or atomic unit and has a label. Edges are links between the atomic units. Patterns are rigid substructures that may occur in a graph after allowing for an arbitrary number of whole-structure rotations and translations as well as a small number (specified by the user) of edit operations in the patterns or in the graph. (When a pattern appears in a graph only after the graph has been modified, we call that appearance "approximate occurrence"). The edit operations include relabeling a node, deleting a node and inserting a node. The proposed method is based on the geometric hashing technique, which hashes node-triplets of the graphs into a 3D table and compresses the labeltriplets in the table. To demonstrate the utility of our algorithms, we discuss two applications of them in scientific data mining. First, we apply the method to locating frequently occurring motifs in two families of proteins pertaining to RNA-directed DNA Polymerase and Thymidylate Synthase and use the motifs to classify the proteins. Then, we apply the method to clustering chemical compounds pertaining to aromatic, bicyclicalkanes, and photosynthesis. Experimental results indicate the good performance of our algorithms and high recall and precision rates for both classification and clustering.

Original languageEnglish (US)
Pages (from-to)731-749
Number of pages19
JournalIEEE Transactions on Knowledge and Data Engineering
Volume14
Issue number4
DOIs
StatePublished - Jul 2002

Fingerprint

Data mining
Proteins
Chemical compounds
Photosynthesis
RNA
Labels
DNA

Keywords

  • Biochemistry
  • Classification and clustering
  • Data mining
  • Geometric hashing
  • KDD
  • Medicine
  • Structural pattern discovery

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Artificial Intelligence
  • Information Systems

Cite this

Finding patterns in three-dimensional graphs : Algorithms and applications to scientific data mining. / Wang, Xiong; Wang, Jason T L; Shasha, Dennis; Shapiro, Bruce A.; Rigoutsos, Isidore; Zhang, Kaizhong.

In: IEEE Transactions on Knowledge and Data Engineering, Vol. 14, No. 4, 07.2002, p. 731-749.

Research output: Contribution to journalArticle

Wang, Xiong ; Wang, Jason T L ; Shasha, Dennis ; Shapiro, Bruce A. ; Rigoutsos, Isidore ; Zhang, Kaizhong. / Finding patterns in three-dimensional graphs : Algorithms and applications to scientific data mining. In: IEEE Transactions on Knowledge and Data Engineering. 2002 ; Vol. 14, No. 4. pp. 731-749.
@article{daecd9c1fd6544ffb9f2cecdd5c6f6c5,
title = "Finding patterns in three-dimensional graphs: Algorithms and applications to scientific data mining",
abstract = "This paper presents a method for finding patterns in 3D graphs. Each node in a graph is an undecomposable or atomic unit and has a label. Edges are links between the atomic units. Patterns are rigid substructures that may occur in a graph after allowing for an arbitrary number of whole-structure rotations and translations as well as a small number (specified by the user) of edit operations in the patterns or in the graph. (When a pattern appears in a graph only after the graph has been modified, we call that appearance {"}approximate occurrence{"}). The edit operations include relabeling a node, deleting a node and inserting a node. The proposed method is based on the geometric hashing technique, which hashes node-triplets of the graphs into a 3D table and compresses the labeltriplets in the table. To demonstrate the utility of our algorithms, we discuss two applications of them in scientific data mining. First, we apply the method to locating frequently occurring motifs in two families of proteins pertaining to RNA-directed DNA Polymerase and Thymidylate Synthase and use the motifs to classify the proteins. Then, we apply the method to clustering chemical compounds pertaining to aromatic, bicyclicalkanes, and photosynthesis. Experimental results indicate the good performance of our algorithms and high recall and precision rates for both classification and clustering.",
keywords = "Biochemistry, Classification and clustering, Data mining, Geometric hashing, KDD, Medicine, Structural pattern discovery",
author = "Xiong Wang and Wang, {Jason T L} and Dennis Shasha and Shapiro, {Bruce A.} and Isidore Rigoutsos and Kaizhong Zhang",
year = "2002",
month = "7",
doi = "10.1109/TKDE.2002.1019211",
language = "English (US)",
volume = "14",
pages = "731--749",
journal = "IEEE Transactions on Knowledge and Data Engineering",
issn = "1041-4347",
publisher = "IEEE Computer Society",
number = "4",

}

TY - JOUR

T1 - Finding patterns in three-dimensional graphs

T2 - Algorithms and applications to scientific data mining

AU - Wang, Xiong

AU - Wang, Jason T L

AU - Shasha, Dennis

AU - Shapiro, Bruce A.

AU - Rigoutsos, Isidore

AU - Zhang, Kaizhong

PY - 2002/7

Y1 - 2002/7

N2 - This paper presents a method for finding patterns in 3D graphs. Each node in a graph is an undecomposable or atomic unit and has a label. Edges are links between the atomic units. Patterns are rigid substructures that may occur in a graph after allowing for an arbitrary number of whole-structure rotations and translations as well as a small number (specified by the user) of edit operations in the patterns or in the graph. (When a pattern appears in a graph only after the graph has been modified, we call that appearance "approximate occurrence"). The edit operations include relabeling a node, deleting a node and inserting a node. The proposed method is based on the geometric hashing technique, which hashes node-triplets of the graphs into a 3D table and compresses the labeltriplets in the table. To demonstrate the utility of our algorithms, we discuss two applications of them in scientific data mining. First, we apply the method to locating frequently occurring motifs in two families of proteins pertaining to RNA-directed DNA Polymerase and Thymidylate Synthase and use the motifs to classify the proteins. Then, we apply the method to clustering chemical compounds pertaining to aromatic, bicyclicalkanes, and photosynthesis. Experimental results indicate the good performance of our algorithms and high recall and precision rates for both classification and clustering.

AB - This paper presents a method for finding patterns in 3D graphs. Each node in a graph is an undecomposable or atomic unit and has a label. Edges are links between the atomic units. Patterns are rigid substructures that may occur in a graph after allowing for an arbitrary number of whole-structure rotations and translations as well as a small number (specified by the user) of edit operations in the patterns or in the graph. (When a pattern appears in a graph only after the graph has been modified, we call that appearance "approximate occurrence"). The edit operations include relabeling a node, deleting a node and inserting a node. The proposed method is based on the geometric hashing technique, which hashes node-triplets of the graphs into a 3D table and compresses the labeltriplets in the table. To demonstrate the utility of our algorithms, we discuss two applications of them in scientific data mining. First, we apply the method to locating frequently occurring motifs in two families of proteins pertaining to RNA-directed DNA Polymerase and Thymidylate Synthase and use the motifs to classify the proteins. Then, we apply the method to clustering chemical compounds pertaining to aromatic, bicyclicalkanes, and photosynthesis. Experimental results indicate the good performance of our algorithms and high recall and precision rates for both classification and clustering.

KW - Biochemistry

KW - Classification and clustering

KW - Data mining

KW - Geometric hashing

KW - KDD

KW - Medicine

KW - Structural pattern discovery

UR - http://www.scopus.com/inward/record.url?scp=0036650077&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036650077&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2002.1019211

DO - 10.1109/TKDE.2002.1019211

M3 - Article

AN - SCOPUS:0036650077

VL - 14

SP - 731

EP - 749

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

SN - 1041-4347

IS - 4

ER -