Fast analytical methods for finding significant labeled graph motifs

Giovanni Micale, Rosalba Giugno, Alfredo Ferro, Misael Mongiovì, Dennis Shasha, Alfredo Pulvirenti

Research output: Contribution to journalArticle

Abstract

Network motif discovery is the problem of finding subgraphs of a network that occur more frequently than expected, according to some reasonable null hypothesis. Such subgraphs may indicate small scale interaction features in genomic interaction networks or intriguing relationships involving actors or a relationship among airlines. When nodes are labeled, they can carry information such as the genomic entity under study or the dominant genre of an actor. For that reason, labeled subgraphs convey information beyond structure and could therefore enjoy more applications. To identify statistically significant motifs in a given network, we propose an analytical method (i.e. simulation-free) that extends the works of Picard et al. (J Comput Biol 15(1):1–20, 2008) and Schbath et al. (J Bioinform Syst Biol 2009(1):616234, 2009) to label-dependent scale-free graph models. We provide an analytical expression of the mean and variance of the count under the Expected Degree Distribution random graph model. Our model deals with both induced and non-induced motifs. We have tested our methodology on a wide set of graphs ranging from protein–protein interaction networks to movie networks. The analytical model is a fast (usually faster by orders of magnitude) alternative to simulation. This advantage increases as graphs grow in size.

Original languageEnglish (US)
Pages (from-to)1-28
Number of pages28
JournalData Mining and Knowledge Discovery
DOIs
StateAccepted/In press - Nov 2 2017

Fingerprint

Labels
Analytical models

Keywords

  • Graph algorithms
  • Labeled graph motifs
  • Network mining
  • Random network models

ASJC Scopus subject areas

  • Information Systems
  • Computer Science Applications
  • Computer Networks and Communications

Cite this

Fast analytical methods for finding significant labeled graph motifs. / Micale, Giovanni; Giugno, Rosalba; Ferro, Alfredo; Mongiovì, Misael; Shasha, Dennis; Pulvirenti, Alfredo.

In: Data Mining and Knowledge Discovery, 02.11.2017, p. 1-28.

Research output: Contribution to journalArticle

Micale, Giovanni ; Giugno, Rosalba ; Ferro, Alfredo ; Mongiovì, Misael ; Shasha, Dennis ; Pulvirenti, Alfredo. / Fast analytical methods for finding significant labeled graph motifs. In: Data Mining and Knowledge Discovery. 2017 ; pp. 1-28.
@article{832323ea6c374496b7c9d53ffb6bfbcf,
title = "Fast analytical methods for finding significant labeled graph motifs",
abstract = "Network motif discovery is the problem of finding subgraphs of a network that occur more frequently than expected, according to some reasonable null hypothesis. Such subgraphs may indicate small scale interaction features in genomic interaction networks or intriguing relationships involving actors or a relationship among airlines. When nodes are labeled, they can carry information such as the genomic entity under study or the dominant genre of an actor. For that reason, labeled subgraphs convey information beyond structure and could therefore enjoy more applications. To identify statistically significant motifs in a given network, we propose an analytical method (i.e. simulation-free) that extends the works of Picard et al. (J Comput Biol 15(1):1–20, 2008) and Schbath et al. (J Bioinform Syst Biol 2009(1):616234, 2009) to label-dependent scale-free graph models. We provide an analytical expression of the mean and variance of the count under the Expected Degree Distribution random graph model. Our model deals with both induced and non-induced motifs. We have tested our methodology on a wide set of graphs ranging from protein–protein interaction networks to movie networks. The analytical model is a fast (usually faster by orders of magnitude) alternative to simulation. This advantage increases as graphs grow in size.",
keywords = "Graph algorithms, Labeled graph motifs, Network mining, Random network models",
author = "Giovanni Micale and Rosalba Giugno and Alfredo Ferro and Misael Mongiov{\`i} and Dennis Shasha and Alfredo Pulvirenti",
year = "2017",
month = "11",
day = "2",
doi = "10.1007/s10618-017-0544-8",
language = "English (US)",
pages = "1--28",
journal = "Data Mining and Knowledge Discovery",
issn = "1384-5810",
publisher = "Springer Netherlands",

}

TY - JOUR

T1 - Fast analytical methods for finding significant labeled graph motifs

AU - Micale, Giovanni

AU - Giugno, Rosalba

AU - Ferro, Alfredo

AU - Mongiovì, Misael

AU - Shasha, Dennis

AU - Pulvirenti, Alfredo

PY - 2017/11/2

Y1 - 2017/11/2

N2 - Network motif discovery is the problem of finding subgraphs of a network that occur more frequently than expected, according to some reasonable null hypothesis. Such subgraphs may indicate small scale interaction features in genomic interaction networks or intriguing relationships involving actors or a relationship among airlines. When nodes are labeled, they can carry information such as the genomic entity under study or the dominant genre of an actor. For that reason, labeled subgraphs convey information beyond structure and could therefore enjoy more applications. To identify statistically significant motifs in a given network, we propose an analytical method (i.e. simulation-free) that extends the works of Picard et al. (J Comput Biol 15(1):1–20, 2008) and Schbath et al. (J Bioinform Syst Biol 2009(1):616234, 2009) to label-dependent scale-free graph models. We provide an analytical expression of the mean and variance of the count under the Expected Degree Distribution random graph model. Our model deals with both induced and non-induced motifs. We have tested our methodology on a wide set of graphs ranging from protein–protein interaction networks to movie networks. The analytical model is a fast (usually faster by orders of magnitude) alternative to simulation. This advantage increases as graphs grow in size.

AB - Network motif discovery is the problem of finding subgraphs of a network that occur more frequently than expected, according to some reasonable null hypothesis. Such subgraphs may indicate small scale interaction features in genomic interaction networks or intriguing relationships involving actors or a relationship among airlines. When nodes are labeled, they can carry information such as the genomic entity under study or the dominant genre of an actor. For that reason, labeled subgraphs convey information beyond structure and could therefore enjoy more applications. To identify statistically significant motifs in a given network, we propose an analytical method (i.e. simulation-free) that extends the works of Picard et al. (J Comput Biol 15(1):1–20, 2008) and Schbath et al. (J Bioinform Syst Biol 2009(1):616234, 2009) to label-dependent scale-free graph models. We provide an analytical expression of the mean and variance of the count under the Expected Degree Distribution random graph model. Our model deals with both induced and non-induced motifs. We have tested our methodology on a wide set of graphs ranging from protein–protein interaction networks to movie networks. The analytical model is a fast (usually faster by orders of magnitude) alternative to simulation. This advantage increases as graphs grow in size.

KW - Graph algorithms

KW - Labeled graph motifs

KW - Network mining

KW - Random network models

UR - http://www.scopus.com/inward/record.url?scp=85027895581&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85027895581&partnerID=8YFLogxK

U2 - 10.1007/s10618-017-0544-8

DO - 10.1007/s10618-017-0544-8

M3 - Article

AN - SCOPUS:85027895581

SP - 1

EP - 28

JO - Data Mining and Knowledge Discovery

JF - Data Mining and Knowledge Discovery

SN - 1384-5810

ER -