GraphClust: A method for clustering database of graphs

Diego Reforgiato, Rodrigo Gutierrez, Dennis Shasha

Research output: Contribution to journalArticle

Abstract

Any application that represents data as sets of graphs may benefit from the discovery of relationships among those graphs. To do this in an unsupervised fashion requires the ability to find graphs that are similar to one another. That is the purpose of GraphClust. The GraphClust algorithm proceeds in three phases, often building on other tools: (1) it finds highly connected substructures in each graph; (2) it uses those substructures to represent each graph as a feature vector; and (3) it clusters these feature vectors using a standard distance measure. We validate the cluster quality by using the Silhouette method. In addition to clustering graphs, GraphClust uses SVD decomposition to find frequently co-occurring connected substructures. The main novelty of GraphClust compared to previous methods is that it is application-independent and scalable to many large graphs.

Original languageEnglish (US)
Pages (from-to)231-241
Number of pages11
JournalJournal of Information and Knowledge Management
Volume7
Issue number4
DOIs
StatePublished - 2008

Fingerprint

Singular value decomposition
Decomposition
ability

Keywords

  • document vectors
  • graph clustering
  • graph substructure matching
  • Text clustering

ASJC Scopus subject areas

  • Library and Information Sciences
  • Computer Networks and Communications
  • Computer Science Applications

Cite this

GraphClust : A method for clustering database of graphs. / Reforgiato, Diego; Gutierrez, Rodrigo; Shasha, Dennis.

In: Journal of Information and Knowledge Management, Vol. 7, No. 4, 2008, p. 231-241.

Research output: Contribution to journalArticle

Reforgiato, Diego ; Gutierrez, Rodrigo ; Shasha, Dennis. / GraphClust : A method for clustering database of graphs. In: Journal of Information and Knowledge Management. 2008 ; Vol. 7, No. 4. pp. 231-241.
@article{dc8e02419f7f4b8ab1f69c5c97c34632,
title = "GraphClust: A method for clustering database of graphs",
abstract = "Any application that represents data as sets of graphs may benefit from the discovery of relationships among those graphs. To do this in an unsupervised fashion requires the ability to find graphs that are similar to one another. That is the purpose of GraphClust. The GraphClust algorithm proceeds in three phases, often building on other tools: (1) it finds highly connected substructures in each graph; (2) it uses those substructures to represent each graph as a feature vector; and (3) it clusters these feature vectors using a standard distance measure. We validate the cluster quality by using the Silhouette method. In addition to clustering graphs, GraphClust uses SVD decomposition to find frequently co-occurring connected substructures. The main novelty of GraphClust compared to previous methods is that it is application-independent and scalable to many large graphs.",
keywords = "document vectors, graph clustering, graph substructure matching, Text clustering",
author = "Diego Reforgiato and Rodrigo Gutierrez and Dennis Shasha",
year = "2008",
doi = "10.1142/S0219649208002093",
language = "English (US)",
volume = "7",
pages = "231--241",
journal = "Journal of Information and Knowledge Management",
issn = "0219-6492",
publisher = "World Scientific Publishing Co.",
number = "4",

}

TY - JOUR

T1 - GraphClust

T2 - A method for clustering database of graphs

AU - Reforgiato, Diego

AU - Gutierrez, Rodrigo

AU - Shasha, Dennis

PY - 2008

Y1 - 2008

N2 - Any application that represents data as sets of graphs may benefit from the discovery of relationships among those graphs. To do this in an unsupervised fashion requires the ability to find graphs that are similar to one another. That is the purpose of GraphClust. The GraphClust algorithm proceeds in three phases, often building on other tools: (1) it finds highly connected substructures in each graph; (2) it uses those substructures to represent each graph as a feature vector; and (3) it clusters these feature vectors using a standard distance measure. We validate the cluster quality by using the Silhouette method. In addition to clustering graphs, GraphClust uses SVD decomposition to find frequently co-occurring connected substructures. The main novelty of GraphClust compared to previous methods is that it is application-independent and scalable to many large graphs.

AB - Any application that represents data as sets of graphs may benefit from the discovery of relationships among those graphs. To do this in an unsupervised fashion requires the ability to find graphs that are similar to one another. That is the purpose of GraphClust. The GraphClust algorithm proceeds in three phases, often building on other tools: (1) it finds highly connected substructures in each graph; (2) it uses those substructures to represent each graph as a feature vector; and (3) it clusters these feature vectors using a standard distance measure. We validate the cluster quality by using the Silhouette method. In addition to clustering graphs, GraphClust uses SVD decomposition to find frequently co-occurring connected substructures. The main novelty of GraphClust compared to previous methods is that it is application-independent and scalable to many large graphs.

KW - document vectors

KW - graph clustering

KW - graph substructure matching

KW - Text clustering

UR - http://www.scopus.com/inward/record.url?scp=80052507597&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80052507597&partnerID=8YFLogxK

U2 - 10.1142/S0219649208002093

DO - 10.1142/S0219649208002093

M3 - Article

AN - SCOPUS:80052507597

VL - 7

SP - 231

EP - 241

JO - Journal of Information and Knowledge Management

JF - Journal of Information and Knowledge Management

SN - 0219-6492

IS - 4

ER -