Predicting malaria interactome classifications from time-course transcriptomic data along the intraerythrocytic developmental cycle

Antonina Mitrofanova, Samantha Kleinberg, Jane Carlton, Simon Kasif, Bud Mishra

Research output: Contribution to journalArticle

Abstract

Objective: Even though a vaccine for malaria infections has been under intense study for many years, it has resisted several different lines of attack attempted by biologists. More than half of Plasmodium proteins still remain uncharacterized and therefore cannot be used in clinical trials. The task is further complicated by the metamorphic life-cycle of the parasite, which allows for rapid evolutionary changes and diversity among related strains, thus making precise targeting of the appropriate proteins for vaccination a technical challenge. We propose an automated method for predicting functions for the malaria parasite, which capitalizes on the importance of the intraerythrocytic developmental cycle data and expression changes during its five phases, as determined computationally by our segmentation algorithm. Materials and methods: Our method combines temporal gene expression profiles with protein-protein interaction data, sequence similarity scores, and metabolic pathway information to produce a set of predicted protein functions that can be used as targets for vaccine development. We use a Bayesian approach, which assigns a probability of having (or not having) a particular function to each protein, given the various sources of evidence. In our method, each data source is represented by either a functional linkage graph or a categorical feature vector. Results and conclusions: The methods are tested on Plasmodium falciparum, the species responsible for the deadliest malaria infections. The algorithm was able to assign meaningful functions to 628 out of 1439 previously unannotated proteins, which are first-choice candidates for experimental vaccine research. We conclude that analyzing time-course gene expression profiles in separate phases leads to much higher prediction accuracy when compared with Pearson correlation coefficients computed across the time course as a whole. Additionally, we demonstrate that temporal expression profiles alone are able to improve the predictive power of the integrated data.

Original languageEnglish (US)
Pages (from-to)167-176
Number of pages10
JournalArtificial Intelligence in Medicine
Volume49
Issue number3
DOIs
StatePublished - Jul 2010

Fingerprint

Malaria
Proteins
Vaccines
Transcriptome
Gene expression
Parasites
Malaria Vaccines
Bayes Theorem
Plasmodium
Information Storage and Retrieval
Protein Transport
Plasmodium falciparum
Metabolic Networks and Pathways
Infection
Life Cycle Stages
Vaccination
Life cycle
Clinical Trials
Research

Keywords

  • Bayesian probabilistic approach
  • Intraerythrocytic developmental cycle
  • N-terminal host targeting motif
  • Pexel
  • Plasmodium falciparum
  • Protein function prediction
  • Red blood cell membrane proteins
  • Time-course gene expression data

ASJC Scopus subject areas

  • Artificial Intelligence
  • Medicine (miscellaneous)
  • Medicine(all)

Cite this

Predicting malaria interactome classifications from time-course transcriptomic data along the intraerythrocytic developmental cycle. / Mitrofanova, Antonina; Kleinberg, Samantha; Carlton, Jane; Kasif, Simon; Mishra, Bud.

In: Artificial Intelligence in Medicine, Vol. 49, No. 3, 07.2010, p. 167-176.

Research output: Contribution to journalArticle

@article{6b3075263fe24689b4b6f5ed2009b833,
title = "Predicting malaria interactome classifications from time-course transcriptomic data along the intraerythrocytic developmental cycle",
abstract = "Objective: Even though a vaccine for malaria infections has been under intense study for many years, it has resisted several different lines of attack attempted by biologists. More than half of Plasmodium proteins still remain uncharacterized and therefore cannot be used in clinical trials. The task is further complicated by the metamorphic life-cycle of the parasite, which allows for rapid evolutionary changes and diversity among related strains, thus making precise targeting of the appropriate proteins for vaccination a technical challenge. We propose an automated method for predicting functions for the malaria parasite, which capitalizes on the importance of the intraerythrocytic developmental cycle data and expression changes during its five phases, as determined computationally by our segmentation algorithm. Materials and methods: Our method combines temporal gene expression profiles with protein-protein interaction data, sequence similarity scores, and metabolic pathway information to produce a set of predicted protein functions that can be used as targets for vaccine development. We use a Bayesian approach, which assigns a probability of having (or not having) a particular function to each protein, given the various sources of evidence. In our method, each data source is represented by either a functional linkage graph or a categorical feature vector. Results and conclusions: The methods are tested on Plasmodium falciparum, the species responsible for the deadliest malaria infections. The algorithm was able to assign meaningful functions to 628 out of 1439 previously unannotated proteins, which are first-choice candidates for experimental vaccine research. We conclude that analyzing time-course gene expression profiles in separate phases leads to much higher prediction accuracy when compared with Pearson correlation coefficients computed across the time course as a whole. Additionally, we demonstrate that temporal expression profiles alone are able to improve the predictive power of the integrated data.",
keywords = "Bayesian probabilistic approach, Intraerythrocytic developmental cycle, N-terminal host targeting motif, Pexel, Plasmodium falciparum, Protein function prediction, Red blood cell membrane proteins, Time-course gene expression data",
author = "Antonina Mitrofanova and Samantha Kleinberg and Jane Carlton and Simon Kasif and Bud Mishra",
year = "2010",
month = "7",
doi = "10.1016/j.artmed.2010.04.013",
language = "English (US)",
volume = "49",
pages = "167--176",
journal = "Artificial Intelligence in Medicine",
issn = "0933-3657",
publisher = "Elsevier",
number = "3",

}

TY - JOUR

T1 - Predicting malaria interactome classifications from time-course transcriptomic data along the intraerythrocytic developmental cycle

AU - Mitrofanova, Antonina

AU - Kleinberg, Samantha

AU - Carlton, Jane

AU - Kasif, Simon

AU - Mishra, Bud

PY - 2010/7

Y1 - 2010/7

N2 - Objective: Even though a vaccine for malaria infections has been under intense study for many years, it has resisted several different lines of attack attempted by biologists. More than half of Plasmodium proteins still remain uncharacterized and therefore cannot be used in clinical trials. The task is further complicated by the metamorphic life-cycle of the parasite, which allows for rapid evolutionary changes and diversity among related strains, thus making precise targeting of the appropriate proteins for vaccination a technical challenge. We propose an automated method for predicting functions for the malaria parasite, which capitalizes on the importance of the intraerythrocytic developmental cycle data and expression changes during its five phases, as determined computationally by our segmentation algorithm. Materials and methods: Our method combines temporal gene expression profiles with protein-protein interaction data, sequence similarity scores, and metabolic pathway information to produce a set of predicted protein functions that can be used as targets for vaccine development. We use a Bayesian approach, which assigns a probability of having (or not having) a particular function to each protein, given the various sources of evidence. In our method, each data source is represented by either a functional linkage graph or a categorical feature vector. Results and conclusions: The methods are tested on Plasmodium falciparum, the species responsible for the deadliest malaria infections. The algorithm was able to assign meaningful functions to 628 out of 1439 previously unannotated proteins, which are first-choice candidates for experimental vaccine research. We conclude that analyzing time-course gene expression profiles in separate phases leads to much higher prediction accuracy when compared with Pearson correlation coefficients computed across the time course as a whole. Additionally, we demonstrate that temporal expression profiles alone are able to improve the predictive power of the integrated data.

AB - Objective: Even though a vaccine for malaria infections has been under intense study for many years, it has resisted several different lines of attack attempted by biologists. More than half of Plasmodium proteins still remain uncharacterized and therefore cannot be used in clinical trials. The task is further complicated by the metamorphic life-cycle of the parasite, which allows for rapid evolutionary changes and diversity among related strains, thus making precise targeting of the appropriate proteins for vaccination a technical challenge. We propose an automated method for predicting functions for the malaria parasite, which capitalizes on the importance of the intraerythrocytic developmental cycle data and expression changes during its five phases, as determined computationally by our segmentation algorithm. Materials and methods: Our method combines temporal gene expression profiles with protein-protein interaction data, sequence similarity scores, and metabolic pathway information to produce a set of predicted protein functions that can be used as targets for vaccine development. We use a Bayesian approach, which assigns a probability of having (or not having) a particular function to each protein, given the various sources of evidence. In our method, each data source is represented by either a functional linkage graph or a categorical feature vector. Results and conclusions: The methods are tested on Plasmodium falciparum, the species responsible for the deadliest malaria infections. The algorithm was able to assign meaningful functions to 628 out of 1439 previously unannotated proteins, which are first-choice candidates for experimental vaccine research. We conclude that analyzing time-course gene expression profiles in separate phases leads to much higher prediction accuracy when compared with Pearson correlation coefficients computed across the time course as a whole. Additionally, we demonstrate that temporal expression profiles alone are able to improve the predictive power of the integrated data.

KW - Bayesian probabilistic approach

KW - Intraerythrocytic developmental cycle

KW - N-terminal host targeting motif

KW - Pexel

KW - Plasmodium falciparum

KW - Protein function prediction

KW - Red blood cell membrane proteins

KW - Time-course gene expression data

UR - http://www.scopus.com/inward/record.url?scp=77954315665&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77954315665&partnerID=8YFLogxK

U2 - 10.1016/j.artmed.2010.04.013

DO - 10.1016/j.artmed.2010.04.013

M3 - Article

VL - 49

SP - 167

EP - 176

JO - Artificial Intelligence in Medicine

JF - Artificial Intelligence in Medicine

SN - 0933-3657

IS - 3

ER -