Lineage-based identification of cellular states and expression programs

Tatsunori Hashimoto, Tommi Jaakkola, Richard Sherwood, Esteban O. Mazzoni, Hynek Wichterle, David Gifford

Research output: Contribution to journalArticle

Abstract

We present a method, LineageProgram, that uses the developmental lineage relationship of observed gene expression measurements to improve the learning of developmentally relevant cellular states and expression programs. We find that incorporating lineage information allows us to significantly improve both the predictive power and interpretability of expression programs that are derived from expression measurements from in vitro differentiation experiments. The lineage tree of a differentiation experiment is a tree graph whose nodes describe all of the unique expression states in the input expression measurements, and edges describe the experimental perturbations applied to cells. Our method, LineageProgram, is based on a log-linear model with parameters that reflect changes along the lineage tree. Regularization with L1 that based methods controls the parameters in three distinct ways: the number of genes change between two cellular states, the number of unique cellular states, and the number of underlying factors responsible for changes in cell state. The model is estimated with proximal operators to quickly discover a small number of key cell states and gene sets. Comparisons with existing factorization, techniques, such as singular value decomposition and non-negative matrix factorization show that our method provides higher predictive power in held, out tests while inducing sparse and biologically relevant gene sets.

Original languageEnglish (US)
Article numberbts204
JournalBioinformatics
Volume28
Issue number12
DOIs
StatePublished - Jun 2012

Fingerprint

Genes
Factorization
Gene
Singular value decomposition
Cell
Gene expression
Experiments
Non-negative Matrix Factorization
Log-linear Models
Interpretability
Linear Models
Cell Count
Learning
Gene Expression
Experiment
Regularization
Perturbation
Distinct
Graph in graph theory
Vertex of a graph

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Computational Mathematics
  • Statistics and Probability
  • Medicine(all)

Cite this

Hashimoto, T., Jaakkola, T., Sherwood, R., Mazzoni, E. O., Wichterle, H., & Gifford, D. (2012). Lineage-based identification of cellular states and expression programs. Bioinformatics, 28(12), [bts204]. https://doi.org/10.1093/bioinformatics/bts204

Lineage-based identification of cellular states and expression programs. / Hashimoto, Tatsunori; Jaakkola, Tommi; Sherwood, Richard; Mazzoni, Esteban O.; Wichterle, Hynek; Gifford, David.

In: Bioinformatics, Vol. 28, No. 12, bts204, 06.2012.

Research output: Contribution to journalArticle

Hashimoto, T, Jaakkola, T, Sherwood, R, Mazzoni, EO, Wichterle, H & Gifford, D 2012, 'Lineage-based identification of cellular states and expression programs', Bioinformatics, vol. 28, no. 12, bts204. https://doi.org/10.1093/bioinformatics/bts204
Hashimoto, Tatsunori ; Jaakkola, Tommi ; Sherwood, Richard ; Mazzoni, Esteban O. ; Wichterle, Hynek ; Gifford, David. / Lineage-based identification of cellular states and expression programs. In: Bioinformatics. 2012 ; Vol. 28, No. 12.
@article{898f3a9cea9543c7af9d4cb603ca9bd7,
title = "Lineage-based identification of cellular states and expression programs",
abstract = "We present a method, LineageProgram, that uses the developmental lineage relationship of observed gene expression measurements to improve the learning of developmentally relevant cellular states and expression programs. We find that incorporating lineage information allows us to significantly improve both the predictive power and interpretability of expression programs that are derived from expression measurements from in vitro differentiation experiments. The lineage tree of a differentiation experiment is a tree graph whose nodes describe all of the unique expression states in the input expression measurements, and edges describe the experimental perturbations applied to cells. Our method, LineageProgram, is based on a log-linear model with parameters that reflect changes along the lineage tree. Regularization with L1 that based methods controls the parameters in three distinct ways: the number of genes change between two cellular states, the number of unique cellular states, and the number of underlying factors responsible for changes in cell state. The model is estimated with proximal operators to quickly discover a small number of key cell states and gene sets. Comparisons with existing factorization, techniques, such as singular value decomposition and non-negative matrix factorization show that our method provides higher predictive power in held, out tests while inducing sparse and biologically relevant gene sets.",
author = "Tatsunori Hashimoto and Tommi Jaakkola and Richard Sherwood and Mazzoni, {Esteban O.} and Hynek Wichterle and David Gifford",
year = "2012",
month = "6",
doi = "10.1093/bioinformatics/bts204",
language = "English (US)",
volume = "28",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "12",

}

TY - JOUR

T1 - Lineage-based identification of cellular states and expression programs

AU - Hashimoto, Tatsunori

AU - Jaakkola, Tommi

AU - Sherwood, Richard

AU - Mazzoni, Esteban O.

AU - Wichterle, Hynek

AU - Gifford, David

PY - 2012/6

Y1 - 2012/6

N2 - We present a method, LineageProgram, that uses the developmental lineage relationship of observed gene expression measurements to improve the learning of developmentally relevant cellular states and expression programs. We find that incorporating lineage information allows us to significantly improve both the predictive power and interpretability of expression programs that are derived from expression measurements from in vitro differentiation experiments. The lineage tree of a differentiation experiment is a tree graph whose nodes describe all of the unique expression states in the input expression measurements, and edges describe the experimental perturbations applied to cells. Our method, LineageProgram, is based on a log-linear model with parameters that reflect changes along the lineage tree. Regularization with L1 that based methods controls the parameters in three distinct ways: the number of genes change between two cellular states, the number of unique cellular states, and the number of underlying factors responsible for changes in cell state. The model is estimated with proximal operators to quickly discover a small number of key cell states and gene sets. Comparisons with existing factorization, techniques, such as singular value decomposition and non-negative matrix factorization show that our method provides higher predictive power in held, out tests while inducing sparse and biologically relevant gene sets.

AB - We present a method, LineageProgram, that uses the developmental lineage relationship of observed gene expression measurements to improve the learning of developmentally relevant cellular states and expression programs. We find that incorporating lineage information allows us to significantly improve both the predictive power and interpretability of expression programs that are derived from expression measurements from in vitro differentiation experiments. The lineage tree of a differentiation experiment is a tree graph whose nodes describe all of the unique expression states in the input expression measurements, and edges describe the experimental perturbations applied to cells. Our method, LineageProgram, is based on a log-linear model with parameters that reflect changes along the lineage tree. Regularization with L1 that based methods controls the parameters in three distinct ways: the number of genes change between two cellular states, the number of unique cellular states, and the number of underlying factors responsible for changes in cell state. The model is estimated with proximal operators to quickly discover a small number of key cell states and gene sets. Comparisons with existing factorization, techniques, such as singular value decomposition and non-negative matrix factorization show that our method provides higher predictive power in held, out tests while inducing sparse and biologically relevant gene sets.

UR - http://www.scopus.com/inward/record.url?scp=84863533720&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863533720&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bts204

DO - 10.1093/bioinformatics/bts204

M3 - Article

VL - 28

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 12

M1 - bts204

ER -