A subgraph isomorphism algorithm and its application to biochemical data

Vincenzo Bonnici, Rosalba Giugno, Alfredo Pulvirenti, Dennis Shasha, Alfredo Ferro

Research output: Contribution to journalArticle

Abstract

Background: Graphs can represent biological networks at the molecular, protein, or species level. An important query is to find all matches of a pattern graph to a target graph. Accomplishing this is inherently difficult (NP-complete) and the efficiency of heuristic algorithms for the problem may depend upon the input graphs. The common aim of existing algorithms is to eliminate unsuccessful mappings as early as and as inexpensively as possible.Results: We propose a new subgraph isomorphism algorithm which applies a search strategy to significantly reduce the search space without using any complex pruning rules or domain reduction procedures. We compare our method with the most recent and efficient subgraph isomorphism algorithms (VFlib, LAD, and our C++ implementation of FocusSearch which was originally distributed in Modula2) on synthetic, molecules, and interaction networks data. We show a significant reduction in the running time of our approach compared with these other excellent methods and show that our algorithm scales well as memory demands increase.Conclusions: Subgraph isomorphism algorithms are intensively used by biochemical tools. Our analysis gives a comprehensive comparison of different software approaches to subgraph isomorphism highlighting their weaknesses and strengths. This will help researchers make a rational choice among methods depending on their application. We also distribute an open-source package including our system and our own C++ implementation of FocusSearch together with all the used datasets (http://ferrolab.dmi.unict.it/ri.html). In future work, our findings may be extended to approximate subgraph isomorphism algorithms.

Original languageEnglish (US)
Article numberS13
JournalBMC Bioinformatics
Volume14
Issue numberSUPPL7
DOIs
StatePublished - Apr 22 2013

Fingerprint

Subgraph
Isomorphism
Graph in graph theory
C++
Biological Networks
Search Strategy
Heuristic algorithms
Pruning
Open Source
Heuristic algorithm
Search Space
Eliminate
NP-complete problem
Molecules
Query
Software
Proteins
Protein
Data storage equipment
Research Personnel

Keywords

  • Algorithms comparisons and distributions
  • Biochemical graph data
  • Search strategies
  • Subgraph isomorphism algorithms

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics
  • Structural Biology
  • Medicine(all)

Cite this

A subgraph isomorphism algorithm and its application to biochemical data. / Bonnici, Vincenzo; Giugno, Rosalba; Pulvirenti, Alfredo; Shasha, Dennis; Ferro, Alfredo.

In: BMC Bioinformatics, Vol. 14, No. SUPPL7, S13, 22.04.2013.

Research output: Contribution to journalArticle

Bonnici, Vincenzo ; Giugno, Rosalba ; Pulvirenti, Alfredo ; Shasha, Dennis ; Ferro, Alfredo. / A subgraph isomorphism algorithm and its application to biochemical data. In: BMC Bioinformatics. 2013 ; Vol. 14, No. SUPPL7.
@article{f218b37621e549bd86d413a3494c87f3,
title = "A subgraph isomorphism algorithm and its application to biochemical data",
abstract = "Background: Graphs can represent biological networks at the molecular, protein, or species level. An important query is to find all matches of a pattern graph to a target graph. Accomplishing this is inherently difficult (NP-complete) and the efficiency of heuristic algorithms for the problem may depend upon the input graphs. The common aim of existing algorithms is to eliminate unsuccessful mappings as early as and as inexpensively as possible.Results: We propose a new subgraph isomorphism algorithm which applies a search strategy to significantly reduce the search space without using any complex pruning rules or domain reduction procedures. We compare our method with the most recent and efficient subgraph isomorphism algorithms (VFlib, LAD, and our C++ implementation of FocusSearch which was originally distributed in Modula2) on synthetic, molecules, and interaction networks data. We show a significant reduction in the running time of our approach compared with these other excellent methods and show that our algorithm scales well as memory demands increase.Conclusions: Subgraph isomorphism algorithms are intensively used by biochemical tools. Our analysis gives a comprehensive comparison of different software approaches to subgraph isomorphism highlighting their weaknesses and strengths. This will help researchers make a rational choice among methods depending on their application. We also distribute an open-source package including our system and our own C++ implementation of FocusSearch together with all the used datasets (http://ferrolab.dmi.unict.it/ri.html). In future work, our findings may be extended to approximate subgraph isomorphism algorithms.",
keywords = "Algorithms comparisons and distributions, Biochemical graph data, Search strategies, Subgraph isomorphism algorithms",
author = "Vincenzo Bonnici and Rosalba Giugno and Alfredo Pulvirenti and Dennis Shasha and Alfredo Ferro",
year = "2013",
month = "4",
day = "22",
doi = "10.1186/1471-2105-14-S7-S13",
language = "English (US)",
volume = "14",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "SUPPL7",

}

TY - JOUR

T1 - A subgraph isomorphism algorithm and its application to biochemical data

AU - Bonnici, Vincenzo

AU - Giugno, Rosalba

AU - Pulvirenti, Alfredo

AU - Shasha, Dennis

AU - Ferro, Alfredo

PY - 2013/4/22

Y1 - 2013/4/22

N2 - Background: Graphs can represent biological networks at the molecular, protein, or species level. An important query is to find all matches of a pattern graph to a target graph. Accomplishing this is inherently difficult (NP-complete) and the efficiency of heuristic algorithms for the problem may depend upon the input graphs. The common aim of existing algorithms is to eliminate unsuccessful mappings as early as and as inexpensively as possible.Results: We propose a new subgraph isomorphism algorithm which applies a search strategy to significantly reduce the search space without using any complex pruning rules or domain reduction procedures. We compare our method with the most recent and efficient subgraph isomorphism algorithms (VFlib, LAD, and our C++ implementation of FocusSearch which was originally distributed in Modula2) on synthetic, molecules, and interaction networks data. We show a significant reduction in the running time of our approach compared with these other excellent methods and show that our algorithm scales well as memory demands increase.Conclusions: Subgraph isomorphism algorithms are intensively used by biochemical tools. Our analysis gives a comprehensive comparison of different software approaches to subgraph isomorphism highlighting their weaknesses and strengths. This will help researchers make a rational choice among methods depending on their application. We also distribute an open-source package including our system and our own C++ implementation of FocusSearch together with all the used datasets (http://ferrolab.dmi.unict.it/ri.html). In future work, our findings may be extended to approximate subgraph isomorphism algorithms.

AB - Background: Graphs can represent biological networks at the molecular, protein, or species level. An important query is to find all matches of a pattern graph to a target graph. Accomplishing this is inherently difficult (NP-complete) and the efficiency of heuristic algorithms for the problem may depend upon the input graphs. The common aim of existing algorithms is to eliminate unsuccessful mappings as early as and as inexpensively as possible.Results: We propose a new subgraph isomorphism algorithm which applies a search strategy to significantly reduce the search space without using any complex pruning rules or domain reduction procedures. We compare our method with the most recent and efficient subgraph isomorphism algorithms (VFlib, LAD, and our C++ implementation of FocusSearch which was originally distributed in Modula2) on synthetic, molecules, and interaction networks data. We show a significant reduction in the running time of our approach compared with these other excellent methods and show that our algorithm scales well as memory demands increase.Conclusions: Subgraph isomorphism algorithms are intensively used by biochemical tools. Our analysis gives a comprehensive comparison of different software approaches to subgraph isomorphism highlighting their weaknesses and strengths. This will help researchers make a rational choice among methods depending on their application. We also distribute an open-source package including our system and our own C++ implementation of FocusSearch together with all the used datasets (http://ferrolab.dmi.unict.it/ri.html). In future work, our findings may be extended to approximate subgraph isomorphism algorithms.

KW - Algorithms comparisons and distributions

KW - Biochemical graph data

KW - Search strategies

KW - Subgraph isomorphism algorithms

UR - http://www.scopus.com/inward/record.url?scp=84887032278&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84887032278&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-14-S7-S13

DO - 10.1186/1471-2105-14-S7-S13

M3 - Article

VL - 14

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - SUPPL7

M1 - S13

ER -