Prediction of protein functions with gene ontology and interspecies protein homology data

Antonina Mitrofanova, Vladimir Pavlovic, Bhubaneswar Mishra

Research output: Contribution to journalArticle

Abstract

Accurate computational prediction of protein functions increasingly relies on network-inspired models for the protein function transfer. This task can become challenging for proteins isolated in their own network or those with poor or uncharacterized neighborhoods. Here, we present a novel probabilistic chain-graph-based approach for predicting protein functions that builds on connecting networks of two (or more) different species by links of high interspecies sequence homology. In this way, proteins are able to exchange functional information with their neighbors-homologs from a different species. The knowledge of interspecies relationships, such as the sequence homology, can become crucial in cases of limited information from other sources of data, including the protein-protein interactions or cellular locations of proteins. We further enhance our model to account for the Gene Ontology dependencies by linking multiple but related functional ontology categories within and across multiple species. The resulting networks are of significantly higher complexity than most traditional protein network models. We comprehensively benchmark our method by applying it to two largest protein networks, the Yeast and the Fly. The joint Fly-Yeast network provides substantial improvements in precision, accuracy, and false positive rate over networks that consider either of the sources in isolation. At the same time, the new model retains the computational efficiency similar to that of the simpler networks.

Original languageEnglish (US)
Article number5432154
Pages (from-to)775-784
Number of pages10
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume8
Issue number3
DOIs
StatePublished - 2011

Fingerprint

Gene Ontology
Ontology
Homology
Genes
Proteins
Protein
Prediction
Yeast
Sequence Homology
Diptera
Chain Graph
Benchmarking
Protein-protein Interaction
Fungal Proteins
Information Storage and Retrieval
False Positive
Computational Efficiency
Transfer Function
Network Model
Linking

Keywords

  • bioinformatics (genome or protein) databases
  • Biology and genetics
  • machine learning

ASJC Scopus subject areas

  • Biotechnology
  • Genetics
  • Applied Mathematics
  • Medicine(all)

Cite this

Prediction of protein functions with gene ontology and interspecies protein homology data. / Mitrofanova, Antonina; Pavlovic, Vladimir; Mishra, Bhubaneswar.

In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 8, No. 3, 5432154, 2011, p. 775-784.

Research output: Contribution to journalArticle

@article{8c910d8895014691b133cf4720769324,
title = "Prediction of protein functions with gene ontology and interspecies protein homology data",
abstract = "Accurate computational prediction of protein functions increasingly relies on network-inspired models for the protein function transfer. This task can become challenging for proteins isolated in their own network or those with poor or uncharacterized neighborhoods. Here, we present a novel probabilistic chain-graph-based approach for predicting protein functions that builds on connecting networks of two (or more) different species by links of high interspecies sequence homology. In this way, proteins are able to exchange functional information with their neighbors-homologs from a different species. The knowledge of interspecies relationships, such as the sequence homology, can become crucial in cases of limited information from other sources of data, including the protein-protein interactions or cellular locations of proteins. We further enhance our model to account for the Gene Ontology dependencies by linking multiple but related functional ontology categories within and across multiple species. The resulting networks are of significantly higher complexity than most traditional protein network models. We comprehensively benchmark our method by applying it to two largest protein networks, the Yeast and the Fly. The joint Fly-Yeast network provides substantial improvements in precision, accuracy, and false positive rate over networks that consider either of the sources in isolation. At the same time, the new model retains the computational efficiency similar to that of the simpler networks.",
keywords = "bioinformatics (genome or protein) databases, Biology and genetics, machine learning",
author = "Antonina Mitrofanova and Vladimir Pavlovic and Bhubaneswar Mishra",
year = "2011",
doi = "10.1109/TCBB.2010.15",
language = "English (US)",
volume = "8",
pages = "775--784",
journal = "IEEE/ACM Transactions on Computational Biology and Bioinformatics",
issn = "1545-5963",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "3",

}

TY - JOUR

T1 - Prediction of protein functions with gene ontology and interspecies protein homology data

AU - Mitrofanova, Antonina

AU - Pavlovic, Vladimir

AU - Mishra, Bhubaneswar

PY - 2011

Y1 - 2011

N2 - Accurate computational prediction of protein functions increasingly relies on network-inspired models for the protein function transfer. This task can become challenging for proteins isolated in their own network or those with poor or uncharacterized neighborhoods. Here, we present a novel probabilistic chain-graph-based approach for predicting protein functions that builds on connecting networks of two (or more) different species by links of high interspecies sequence homology. In this way, proteins are able to exchange functional information with their neighbors-homologs from a different species. The knowledge of interspecies relationships, such as the sequence homology, can become crucial in cases of limited information from other sources of data, including the protein-protein interactions or cellular locations of proteins. We further enhance our model to account for the Gene Ontology dependencies by linking multiple but related functional ontology categories within and across multiple species. The resulting networks are of significantly higher complexity than most traditional protein network models. We comprehensively benchmark our method by applying it to two largest protein networks, the Yeast and the Fly. The joint Fly-Yeast network provides substantial improvements in precision, accuracy, and false positive rate over networks that consider either of the sources in isolation. At the same time, the new model retains the computational efficiency similar to that of the simpler networks.

AB - Accurate computational prediction of protein functions increasingly relies on network-inspired models for the protein function transfer. This task can become challenging for proteins isolated in their own network or those with poor or uncharacterized neighborhoods. Here, we present a novel probabilistic chain-graph-based approach for predicting protein functions that builds on connecting networks of two (or more) different species by links of high interspecies sequence homology. In this way, proteins are able to exchange functional information with their neighbors-homologs from a different species. The knowledge of interspecies relationships, such as the sequence homology, can become crucial in cases of limited information from other sources of data, including the protein-protein interactions or cellular locations of proteins. We further enhance our model to account for the Gene Ontology dependencies by linking multiple but related functional ontology categories within and across multiple species. The resulting networks are of significantly higher complexity than most traditional protein network models. We comprehensively benchmark our method by applying it to two largest protein networks, the Yeast and the Fly. The joint Fly-Yeast network provides substantial improvements in precision, accuracy, and false positive rate over networks that consider either of the sources in isolation. At the same time, the new model retains the computational efficiency similar to that of the simpler networks.

KW - bioinformatics (genome or protein) databases

KW - Biology and genetics

KW - machine learning

UR - http://www.scopus.com/inward/record.url?scp=79952835504&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79952835504&partnerID=8YFLogxK

U2 - 10.1109/TCBB.2010.15

DO - 10.1109/TCBB.2010.15

M3 - Article

VL - 8

SP - 775

EP - 784

JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics

JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics

SN - 1545-5963

IS - 3

M1 - 5432154

ER -