Incorporation of unique molecular identifiers in TruSeq adapters improves the accuracy of quantitative sequencing

Jungeui Hong, David Gresham

Research output: Contribution to journalArticle

Abstract

Quantitative analysis of next-generation sequencing (NGS) data requires discriminating duplicate reads generated by PCR from identical molecules that are of unique origin. Typically, PCR duplicates are identified as sequence reads that align to the same genomic coordinates using reference-based alignment. However, identical molecules can be independently generated during library preparation. Misidentification of these molecules as PCR duplicates can introduce unforeseen biases during analyses. Here, we developed a cost-effective sequencing adapter design by modifying Illumina TruSeq adapters to incorporate a unique molecular identifier (UMI) while maintaining the capacity to undertake multiplexed, single-index sequencing. Incorporation of UMIs into TruSeq adapters (TrUMIseq adapters) enables identification of bona fide PCR duplicates as identically mapped reads with identical UMIs. Using TrUMIseq adapters, we show that accurate removal of PCR duplicates results in improved accuracy of both allele frequency (AF) estimation in heterogeneous populations using DNA sequencing and gene expression quantification using RNA-Seq

Original languageEnglish (US)
Pages (from-to)221-226
Number of pages6
JournalBioTechniques
Volume63
Issue number5
DOIs
StatePublished - Nov 1 2017

Fingerprint

Polymerase Chain Reaction
Molecules
Frequency estimation
Gene expression
RNA
DNA Sequence Analysis
Gene Frequency
DNA
Libraries
Chemical analysis
Costs
Gene Expression
Costs and Cost Analysis
Population

Keywords

  • PCR duplicates
  • RNA-Seq
  • TrUMIseq
  • TruSeq
  • Unique molecular identifier (UMI)

ASJC Scopus subject areas

  • Biotechnology
  • Biochemistry, Genetics and Molecular Biology(all)

Cite this

Incorporation of unique molecular identifiers in TruSeq adapters improves the accuracy of quantitative sequencing. / Hong, Jungeui; Gresham, David.

In: BioTechniques, Vol. 63, No. 5, 01.11.2017, p. 221-226.

Research output: Contribution to journalArticle

@article{a4c293f493ca45a2b933f1769630b7ae,
title = "Incorporation of unique molecular identifiers in TruSeq adapters improves the accuracy of quantitative sequencing",
abstract = "Quantitative analysis of next-generation sequencing (NGS) data requires discriminating duplicate reads generated by PCR from identical molecules that are of unique origin. Typically, PCR duplicates are identified as sequence reads that align to the same genomic coordinates using reference-based alignment. However, identical molecules can be independently generated during library preparation. Misidentification of these molecules as PCR duplicates can introduce unforeseen biases during analyses. Here, we developed a cost-effective sequencing adapter design by modifying Illumina TruSeq adapters to incorporate a unique molecular identifier (UMI) while maintaining the capacity to undertake multiplexed, single-index sequencing. Incorporation of UMIs into TruSeq adapters (TrUMIseq adapters) enables identification of bona fide PCR duplicates as identically mapped reads with identical UMIs. Using TrUMIseq adapters, we show that accurate removal of PCR duplicates results in improved accuracy of both allele frequency (AF) estimation in heterogeneous populations using DNA sequencing and gene expression quantification using RNA-Seq",
keywords = "PCR duplicates, RNA-Seq, TrUMIseq, TruSeq, Unique molecular identifier (UMI)",
author = "Jungeui Hong and David Gresham",
year = "2017",
month = "11",
day = "1",
doi = "10.2144/000114608",
language = "English (US)",
volume = "63",
pages = "221--226",
journal = "BioTechniques",
issn = "0736-6205",
publisher = "Eaton Publishing Company",
number = "5",

}

TY - JOUR

T1 - Incorporation of unique molecular identifiers in TruSeq adapters improves the accuracy of quantitative sequencing

AU - Hong, Jungeui

AU - Gresham, David

PY - 2017/11/1

Y1 - 2017/11/1

N2 - Quantitative analysis of next-generation sequencing (NGS) data requires discriminating duplicate reads generated by PCR from identical molecules that are of unique origin. Typically, PCR duplicates are identified as sequence reads that align to the same genomic coordinates using reference-based alignment. However, identical molecules can be independently generated during library preparation. Misidentification of these molecules as PCR duplicates can introduce unforeseen biases during analyses. Here, we developed a cost-effective sequencing adapter design by modifying Illumina TruSeq adapters to incorporate a unique molecular identifier (UMI) while maintaining the capacity to undertake multiplexed, single-index sequencing. Incorporation of UMIs into TruSeq adapters (TrUMIseq adapters) enables identification of bona fide PCR duplicates as identically mapped reads with identical UMIs. Using TrUMIseq adapters, we show that accurate removal of PCR duplicates results in improved accuracy of both allele frequency (AF) estimation in heterogeneous populations using DNA sequencing and gene expression quantification using RNA-Seq

AB - Quantitative analysis of next-generation sequencing (NGS) data requires discriminating duplicate reads generated by PCR from identical molecules that are of unique origin. Typically, PCR duplicates are identified as sequence reads that align to the same genomic coordinates using reference-based alignment. However, identical molecules can be independently generated during library preparation. Misidentification of these molecules as PCR duplicates can introduce unforeseen biases during analyses. Here, we developed a cost-effective sequencing adapter design by modifying Illumina TruSeq adapters to incorporate a unique molecular identifier (UMI) while maintaining the capacity to undertake multiplexed, single-index sequencing. Incorporation of UMIs into TruSeq adapters (TrUMIseq adapters) enables identification of bona fide PCR duplicates as identically mapped reads with identical UMIs. Using TrUMIseq adapters, we show that accurate removal of PCR duplicates results in improved accuracy of both allele frequency (AF) estimation in heterogeneous populations using DNA sequencing and gene expression quantification using RNA-Seq

KW - PCR duplicates

KW - RNA-Seq

KW - TrUMIseq

KW - TruSeq

KW - Unique molecular identifier (UMI)

UR - http://www.scopus.com/inward/record.url?scp=85037525380&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85037525380&partnerID=8YFLogxK

U2 - 10.2144/000114608

DO - 10.2144/000114608

M3 - Article

VL - 63

SP - 221

EP - 226

JO - BioTechniques

JF - BioTechniques

SN - 0736-6205

IS - 5

ER -