Multi-study inference of regulatory networks for more accurate models of gene regulation

Dayanne M. Castro, Nicholas R. de Veaux, Emily R. Miraldi, Richard Bonneau

Research output: Contribution to journalArticle

Abstract

Gene regulatory networks are composed of sub-networks that are often shared across biological processes, cell-types, and organisms. Leveraging multiple sources of information, such as publicly available gene expression datasets, could therefore be helpful when learning a network of interest. Integrating data across different studies, however, raises numerous technical concerns. Hence, a common approach in network inference, and broadly in genomics research, is to separately learn models from each dataset and combine the results. Individual models, however, often suffer from under-sampling, poor generalization and limited network recovery. In this study, we explore previous integration strategies, such as batch-correction and model ensembles, and introduce a new multitask learning approach for joint network inference across several datasets. Our method initially estimates the activities of transcription factors, and subsequently, infers the relevant network topology. As regulatory interactions are context-dependent, we estimate model coefficients as a combination of both dataset-specific and conserved components. In addition, adaptive penalties may be used to favor models that include interactions derived from multiple sources of prior knowledge including orthogonal genomics experiments. We evaluate generalization and network recovery using examples from Bacillus subtilis and Saccharomyces cerevisiae, and show that sharing information across models improves network reconstruction. Finally, we demonstrate robustness to both false positives in the prior information and heterogeneity among datasets.

Original languageEnglish (US)
Article numbere1006591
JournalPLoS Computational Biology
Volume15
Issue number1
DOIs
StatePublished - Jan 1 2019

Fingerprint

Gene Regulation
Regulatory Networks
Gene expression
gene
Genes
Genomics
genes
Learning
genomics
Biological Phenomena
learning
Model
Information Dissemination
Gene Regulatory Networks
Recovery
Bacillus subtilis
Multi-task Learning
Transcription factors
Bacilli
Saccharomyces cerevisiae

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Modeling and Simulation
  • Ecology
  • Molecular Biology
  • Genetics
  • Cellular and Molecular Neuroscience
  • Computational Theory and Mathematics

Cite this

Multi-study inference of regulatory networks for more accurate models of gene regulation. / Castro, Dayanne M.; de Veaux, Nicholas R.; Miraldi, Emily R.; Bonneau, Richard.

In: PLoS Computational Biology, Vol. 15, No. 1, e1006591, 01.01.2019.

Research output: Contribution to journalArticle

Castro, Dayanne M. ; de Veaux, Nicholas R. ; Miraldi, Emily R. ; Bonneau, Richard. / Multi-study inference of regulatory networks for more accurate models of gene regulation. In: PLoS Computational Biology. 2019 ; Vol. 15, No. 1.
@article{f9bf2d9b7a3f4b98a7ead60bcd15372a,
title = "Multi-study inference of regulatory networks for more accurate models of gene regulation",
abstract = "Gene regulatory networks are composed of sub-networks that are often shared across biological processes, cell-types, and organisms. Leveraging multiple sources of information, such as publicly available gene expression datasets, could therefore be helpful when learning a network of interest. Integrating data across different studies, however, raises numerous technical concerns. Hence, a common approach in network inference, and broadly in genomics research, is to separately learn models from each dataset and combine the results. Individual models, however, often suffer from under-sampling, poor generalization and limited network recovery. In this study, we explore previous integration strategies, such as batch-correction and model ensembles, and introduce a new multitask learning approach for joint network inference across several datasets. Our method initially estimates the activities of transcription factors, and subsequently, infers the relevant network topology. As regulatory interactions are context-dependent, we estimate model coefficients as a combination of both dataset-specific and conserved components. In addition, adaptive penalties may be used to favor models that include interactions derived from multiple sources of prior knowledge including orthogonal genomics experiments. We evaluate generalization and network recovery using examples from Bacillus subtilis and Saccharomyces cerevisiae, and show that sharing information across models improves network reconstruction. Finally, we demonstrate robustness to both false positives in the prior information and heterogeneity among datasets.",
author = "Castro, {Dayanne M.} and {de Veaux}, {Nicholas R.} and Miraldi, {Emily R.} and Richard Bonneau",
year = "2019",
month = "1",
day = "1",
doi = "10.1371/journal.pcbi.1006591",
language = "English (US)",
volume = "15",
journal = "PLoS Computational Biology",
issn = "1553-734X",
publisher = "Public Library of Science",
number = "1",

}

TY - JOUR

T1 - Multi-study inference of regulatory networks for more accurate models of gene regulation

AU - Castro, Dayanne M.

AU - de Veaux, Nicholas R.

AU - Miraldi, Emily R.

AU - Bonneau, Richard

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Gene regulatory networks are composed of sub-networks that are often shared across biological processes, cell-types, and organisms. Leveraging multiple sources of information, such as publicly available gene expression datasets, could therefore be helpful when learning a network of interest. Integrating data across different studies, however, raises numerous technical concerns. Hence, a common approach in network inference, and broadly in genomics research, is to separately learn models from each dataset and combine the results. Individual models, however, often suffer from under-sampling, poor generalization and limited network recovery. In this study, we explore previous integration strategies, such as batch-correction and model ensembles, and introduce a new multitask learning approach for joint network inference across several datasets. Our method initially estimates the activities of transcription factors, and subsequently, infers the relevant network topology. As regulatory interactions are context-dependent, we estimate model coefficients as a combination of both dataset-specific and conserved components. In addition, adaptive penalties may be used to favor models that include interactions derived from multiple sources of prior knowledge including orthogonal genomics experiments. We evaluate generalization and network recovery using examples from Bacillus subtilis and Saccharomyces cerevisiae, and show that sharing information across models improves network reconstruction. Finally, we demonstrate robustness to both false positives in the prior information and heterogeneity among datasets.

AB - Gene regulatory networks are composed of sub-networks that are often shared across biological processes, cell-types, and organisms. Leveraging multiple sources of information, such as publicly available gene expression datasets, could therefore be helpful when learning a network of interest. Integrating data across different studies, however, raises numerous technical concerns. Hence, a common approach in network inference, and broadly in genomics research, is to separately learn models from each dataset and combine the results. Individual models, however, often suffer from under-sampling, poor generalization and limited network recovery. In this study, we explore previous integration strategies, such as batch-correction and model ensembles, and introduce a new multitask learning approach for joint network inference across several datasets. Our method initially estimates the activities of transcription factors, and subsequently, infers the relevant network topology. As regulatory interactions are context-dependent, we estimate model coefficients as a combination of both dataset-specific and conserved components. In addition, adaptive penalties may be used to favor models that include interactions derived from multiple sources of prior knowledge including orthogonal genomics experiments. We evaluate generalization and network recovery using examples from Bacillus subtilis and Saccharomyces cerevisiae, and show that sharing information across models improves network reconstruction. Finally, we demonstrate robustness to both false positives in the prior information and heterogeneity among datasets.

UR - http://www.scopus.com/inward/record.url?scp=85061162782&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061162782&partnerID=8YFLogxK

U2 - 10.1371/journal.pcbi.1006591

DO - 10.1371/journal.pcbi.1006591

M3 - Article

C2 - 30677040

AN - SCOPUS:85061162782

VL - 15

JO - PLoS Computational Biology

JF - PLoS Computational Biology

SN - 1553-734X

IS - 1

M1 - e1006591

ER -