Data aggregation at the level of molecular pathways improves stability of experimental transcriptomic and proteomic data

Nicolas Borisov, Maria Suntsova, Maxim Sorokin, Andrew Garazha, Olga Kovalchuk, Alexander Aliper, Elena Ilnitskaya, Ksenia Lezhnina, Mikhail Korzinkin, Victor Tkachev, Vyacheslav Saenko, Yury Saenko, Dmitry G. Sokov, Nurshat M. Gaifullin, Kirill Kashintsev, Valery Shirokorad, Irina Shabalina, Alex Zhavoronkov, Bhubaneswar Mishra, Charles R. Cantor & 1 others Anton Buzdin

Research output: Contribution to journalArticle

Abstract

High throughput technologies opened a new era in biomedicine by enabling massive analysis of gene expression at both RNA and protein levels. Unfortunately, expression data obtained in different experiments are often poorly compatible, even for the same biologic samples. Here, using experimental and bioinformatic investigation of major experimental platforms, we show that aggregation of gene expression data at the level of molecular pathways helps to diminish cross- and intra-platform bias otherwise clearly seen at the level of individual genes. We created a mathematical model of cumulative suppression of data variation that predicts the ideal parameters and the optimal size of a molecular pathway. We compared the abilities to aggregate experimental molecular data for the 5 alternative methods, also evaluated by their capacity to retain meaningful features of biologic samples. The bioinformatic method OncoFinder showed optimal performance in both tests and should be very useful for future cross-platform data analyses.

Original languageEnglish (US)
Pages (from-to)1810-1823
Number of pages14
JournalCell Cycle
Volume16
Issue number19
DOIs
StatePublished - Oct 2 2017

Fingerprint

Computational Biology
Proteomics
Gene Expression
Theoretical Models
RNA
Technology
Genes
Proteins

Keywords

  • bioinformatics
  • cross-platform analysis
  • gene expression
  • mass spectrometry
  • microarray hybridization
  • next-generation sequencing
  • pathway activation strength
  • proteome
  • signaling pathways
  • transcriptome

ASJC Scopus subject areas

  • Molecular Biology
  • Developmental Biology
  • Cell Biology

Cite this

Borisov, N., Suntsova, M., Sorokin, M., Garazha, A., Kovalchuk, O., Aliper, A., ... Buzdin, A. (2017). Data aggregation at the level of molecular pathways improves stability of experimental transcriptomic and proteomic data. Cell Cycle, 16(19), 1810-1823. https://doi.org/10.1080/15384101.2017.1361068

Data aggregation at the level of molecular pathways improves stability of experimental transcriptomic and proteomic data. / Borisov, Nicolas; Suntsova, Maria; Sorokin, Maxim; Garazha, Andrew; Kovalchuk, Olga; Aliper, Alexander; Ilnitskaya, Elena; Lezhnina, Ksenia; Korzinkin, Mikhail; Tkachev, Victor; Saenko, Vyacheslav; Saenko, Yury; Sokov, Dmitry G.; Gaifullin, Nurshat M.; Kashintsev, Kirill; Shirokorad, Valery; Shabalina, Irina; Zhavoronkov, Alex; Mishra, Bhubaneswar; Cantor, Charles R.; Buzdin, Anton.

In: Cell Cycle, Vol. 16, No. 19, 02.10.2017, p. 1810-1823.

Research output: Contribution to journalArticle

Borisov, N, Suntsova, M, Sorokin, M, Garazha, A, Kovalchuk, O, Aliper, A, Ilnitskaya, E, Lezhnina, K, Korzinkin, M, Tkachev, V, Saenko, V, Saenko, Y, Sokov, DG, Gaifullin, NM, Kashintsev, K, Shirokorad, V, Shabalina, I, Zhavoronkov, A, Mishra, B, Cantor, CR & Buzdin, A 2017, 'Data aggregation at the level of molecular pathways improves stability of experimental transcriptomic and proteomic data', Cell Cycle, vol. 16, no. 19, pp. 1810-1823. https://doi.org/10.1080/15384101.2017.1361068
Borisov, Nicolas ; Suntsova, Maria ; Sorokin, Maxim ; Garazha, Andrew ; Kovalchuk, Olga ; Aliper, Alexander ; Ilnitskaya, Elena ; Lezhnina, Ksenia ; Korzinkin, Mikhail ; Tkachev, Victor ; Saenko, Vyacheslav ; Saenko, Yury ; Sokov, Dmitry G. ; Gaifullin, Nurshat M. ; Kashintsev, Kirill ; Shirokorad, Valery ; Shabalina, Irina ; Zhavoronkov, Alex ; Mishra, Bhubaneswar ; Cantor, Charles R. ; Buzdin, Anton. / Data aggregation at the level of molecular pathways improves stability of experimental transcriptomic and proteomic data. In: Cell Cycle. 2017 ; Vol. 16, No. 19. pp. 1810-1823.
@article{7d31f328c4a74075880decd251bb4c21,
title = "Data aggregation at the level of molecular pathways improves stability of experimental transcriptomic and proteomic data",
abstract = "High throughput technologies opened a new era in biomedicine by enabling massive analysis of gene expression at both RNA and protein levels. Unfortunately, expression data obtained in different experiments are often poorly compatible, even for the same biologic samples. Here, using experimental and bioinformatic investigation of major experimental platforms, we show that aggregation of gene expression data at the level of molecular pathways helps to diminish cross- and intra-platform bias otherwise clearly seen at the level of individual genes. We created a mathematical model of cumulative suppression of data variation that predicts the ideal parameters and the optimal size of a molecular pathway. We compared the abilities to aggregate experimental molecular data for the 5 alternative methods, also evaluated by their capacity to retain meaningful features of biologic samples. The bioinformatic method OncoFinder showed optimal performance in both tests and should be very useful for future cross-platform data analyses.",
keywords = "bioinformatics, cross-platform analysis, gene expression, mass spectrometry, microarray hybridization, next-generation sequencing, pathway activation strength, proteome, signaling pathways, transcriptome",
author = "Nicolas Borisov and Maria Suntsova and Maxim Sorokin and Andrew Garazha and Olga Kovalchuk and Alexander Aliper and Elena Ilnitskaya and Ksenia Lezhnina and Mikhail Korzinkin and Victor Tkachev and Vyacheslav Saenko and Yury Saenko and Sokov, {Dmitry G.} and Gaifullin, {Nurshat M.} and Kirill Kashintsev and Valery Shirokorad and Irina Shabalina and Alex Zhavoronkov and Bhubaneswar Mishra and Cantor, {Charles R.} and Anton Buzdin",
year = "2017",
month = "10",
day = "2",
doi = "10.1080/15384101.2017.1361068",
language = "English (US)",
volume = "16",
pages = "1810--1823",
journal = "Cell Cycle",
issn = "1538-4101",
publisher = "Landes Bioscience",
number = "19",

}

TY - JOUR

T1 - Data aggregation at the level of molecular pathways improves stability of experimental transcriptomic and proteomic data

AU - Borisov, Nicolas

AU - Suntsova, Maria

AU - Sorokin, Maxim

AU - Garazha, Andrew

AU - Kovalchuk, Olga

AU - Aliper, Alexander

AU - Ilnitskaya, Elena

AU - Lezhnina, Ksenia

AU - Korzinkin, Mikhail

AU - Tkachev, Victor

AU - Saenko, Vyacheslav

AU - Saenko, Yury

AU - Sokov, Dmitry G.

AU - Gaifullin, Nurshat M.

AU - Kashintsev, Kirill

AU - Shirokorad, Valery

AU - Shabalina, Irina

AU - Zhavoronkov, Alex

AU - Mishra, Bhubaneswar

AU - Cantor, Charles R.

AU - Buzdin, Anton

PY - 2017/10/2

Y1 - 2017/10/2

N2 - High throughput technologies opened a new era in biomedicine by enabling massive analysis of gene expression at both RNA and protein levels. Unfortunately, expression data obtained in different experiments are often poorly compatible, even for the same biologic samples. Here, using experimental and bioinformatic investigation of major experimental platforms, we show that aggregation of gene expression data at the level of molecular pathways helps to diminish cross- and intra-platform bias otherwise clearly seen at the level of individual genes. We created a mathematical model of cumulative suppression of data variation that predicts the ideal parameters and the optimal size of a molecular pathway. We compared the abilities to aggregate experimental molecular data for the 5 alternative methods, also evaluated by their capacity to retain meaningful features of biologic samples. The bioinformatic method OncoFinder showed optimal performance in both tests and should be very useful for future cross-platform data analyses.

AB - High throughput technologies opened a new era in biomedicine by enabling massive analysis of gene expression at both RNA and protein levels. Unfortunately, expression data obtained in different experiments are often poorly compatible, even for the same biologic samples. Here, using experimental and bioinformatic investigation of major experimental platforms, we show that aggregation of gene expression data at the level of molecular pathways helps to diminish cross- and intra-platform bias otherwise clearly seen at the level of individual genes. We created a mathematical model of cumulative suppression of data variation that predicts the ideal parameters and the optimal size of a molecular pathway. We compared the abilities to aggregate experimental molecular data for the 5 alternative methods, also evaluated by their capacity to retain meaningful features of biologic samples. The bioinformatic method OncoFinder showed optimal performance in both tests and should be very useful for future cross-platform data analyses.

KW - bioinformatics

KW - cross-platform analysis

KW - gene expression

KW - mass spectrometry

KW - microarray hybridization

KW - next-generation sequencing

KW - pathway activation strength

KW - proteome

KW - signaling pathways

KW - transcriptome

UR - http://www.scopus.com/inward/record.url?scp=85029909952&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85029909952&partnerID=8YFLogxK

U2 - 10.1080/15384101.2017.1361068

DO - 10.1080/15384101.2017.1361068

M3 - Article

VL - 16

SP - 1810

EP - 1823

JO - Cell Cycle

JF - Cell Cycle

SN - 1538-4101

IS - 19

ER -