Integrating single-cell transcriptomic data across different conditions, technologies, and species

Andrew Butler, Paul Hoffman, Peter Smibert, Efthymia Papalexi, Rahul Satija

Research output: Contribution to journalArticle

Abstract

Computational single-cell RNA-seq (scRNA-seq) methods have been successfully applied to experiments representing a single condition, technology, or species to discover and define cellular phenotypes. However, identifying subpopulations of cells that are present across multiple data sets remains challenging. Here, we introduce an analytical strategy for integrating scRNA-seq data sets based on common sources of variation, enabling the identification of shared populations across data sets and downstream comparative analysis. We apply this approach, implemented in our R toolkit Seurat (http://satijalab.org/seurat/), to align scRNA-seq data sets of peripheral blood mononuclear cells under resting and stimulated conditions, hematopoietic progenitors sequenced using two profiling technologies, and pancreatic cell 'atlases' generated from human and mouse islets. In each case, we learn distinct or transitional cell states jointly across data sets, while boosting statistical power through integrated analysis. Our approach facilitates general comparisons of scRNA-seq data sets, potentially deepening our understanding of how distinct cell states respond to perturbation, disease, and evolution.

Original languageEnglish (US)
Pages (from-to)411-420
Number of pages10
JournalNature Biotechnology
Volume36
Issue number5
DOIs
StatePublished - Jun 1 2018

Fingerprint

RNA
Technology
Blood
Cells
Atlases
Datasets
Blood Cells
Experiments
Phenotype
Population

ASJC Scopus subject areas

  • Biotechnology
  • Bioengineering
  • Applied Microbiology and Biotechnology
  • Molecular Medicine
  • Biomedical Engineering

Cite this

Integrating single-cell transcriptomic data across different conditions, technologies, and species. / Butler, Andrew; Hoffman, Paul; Smibert, Peter; Papalexi, Efthymia; Satija, Rahul.

In: Nature Biotechnology, Vol. 36, No. 5, 01.06.2018, p. 411-420.

Research output: Contribution to journalArticle

Butler, Andrew ; Hoffman, Paul ; Smibert, Peter ; Papalexi, Efthymia ; Satija, Rahul. / Integrating single-cell transcriptomic data across different conditions, technologies, and species. In: Nature Biotechnology. 2018 ; Vol. 36, No. 5. pp. 411-420.
@article{2abcb4d3d4c3431a853b7db072084920,
title = "Integrating single-cell transcriptomic data across different conditions, technologies, and species",
abstract = "Computational single-cell RNA-seq (scRNA-seq) methods have been successfully applied to experiments representing a single condition, technology, or species to discover and define cellular phenotypes. However, identifying subpopulations of cells that are present across multiple data sets remains challenging. Here, we introduce an analytical strategy for integrating scRNA-seq data sets based on common sources of variation, enabling the identification of shared populations across data sets and downstream comparative analysis. We apply this approach, implemented in our R toolkit Seurat (http://satijalab.org/seurat/), to align scRNA-seq data sets of peripheral blood mononuclear cells under resting and stimulated conditions, hematopoietic progenitors sequenced using two profiling technologies, and pancreatic cell 'atlases' generated from human and mouse islets. In each case, we learn distinct or transitional cell states jointly across data sets, while boosting statistical power through integrated analysis. Our approach facilitates general comparisons of scRNA-seq data sets, potentially deepening our understanding of how distinct cell states respond to perturbation, disease, and evolution.",
author = "Andrew Butler and Paul Hoffman and Peter Smibert and Efthymia Papalexi and Rahul Satija",
year = "2018",
month = "6",
day = "1",
doi = "10.1038/nbt.4096",
language = "English (US)",
volume = "36",
pages = "411--420",
journal = "Nature Biotechnology",
issn = "1087-0156",
publisher = "Nature Publishing Group",
number = "5",

}

TY - JOUR

T1 - Integrating single-cell transcriptomic data across different conditions, technologies, and species

AU - Butler, Andrew

AU - Hoffman, Paul

AU - Smibert, Peter

AU - Papalexi, Efthymia

AU - Satija, Rahul

PY - 2018/6/1

Y1 - 2018/6/1

N2 - Computational single-cell RNA-seq (scRNA-seq) methods have been successfully applied to experiments representing a single condition, technology, or species to discover and define cellular phenotypes. However, identifying subpopulations of cells that are present across multiple data sets remains challenging. Here, we introduce an analytical strategy for integrating scRNA-seq data sets based on common sources of variation, enabling the identification of shared populations across data sets and downstream comparative analysis. We apply this approach, implemented in our R toolkit Seurat (http://satijalab.org/seurat/), to align scRNA-seq data sets of peripheral blood mononuclear cells under resting and stimulated conditions, hematopoietic progenitors sequenced using two profiling technologies, and pancreatic cell 'atlases' generated from human and mouse islets. In each case, we learn distinct or transitional cell states jointly across data sets, while boosting statistical power through integrated analysis. Our approach facilitates general comparisons of scRNA-seq data sets, potentially deepening our understanding of how distinct cell states respond to perturbation, disease, and evolution.

AB - Computational single-cell RNA-seq (scRNA-seq) methods have been successfully applied to experiments representing a single condition, technology, or species to discover and define cellular phenotypes. However, identifying subpopulations of cells that are present across multiple data sets remains challenging. Here, we introduce an analytical strategy for integrating scRNA-seq data sets based on common sources of variation, enabling the identification of shared populations across data sets and downstream comparative analysis. We apply this approach, implemented in our R toolkit Seurat (http://satijalab.org/seurat/), to align scRNA-seq data sets of peripheral blood mononuclear cells under resting and stimulated conditions, hematopoietic progenitors sequenced using two profiling technologies, and pancreatic cell 'atlases' generated from human and mouse islets. In each case, we learn distinct or transitional cell states jointly across data sets, while boosting statistical power through integrated analysis. Our approach facilitates general comparisons of scRNA-seq data sets, potentially deepening our understanding of how distinct cell states respond to perturbation, disease, and evolution.

UR - http://www.scopus.com/inward/record.url?scp=85046298440&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85046298440&partnerID=8YFLogxK

U2 - 10.1038/nbt.4096

DO - 10.1038/nbt.4096

M3 - Article

VL - 36

SP - 411

EP - 420

JO - Nature Biotechnology

JF - Nature Biotechnology

SN - 1087-0156

IS - 5

ER -