A novel signal processing approach for the detection of copy number variations in the human genome

Catherine Stamoulis, Rebecca Betensky

Research output: Contribution to journalArticle

Abstract

Motivation: Human genomic variability occurs at different scales, from single nucleotide polymorphisms (SNPs) to large DNA segments. Copy number variations (CNVs) represent a signicant part of our genetic heterogeneity and have also been associated with many diseases and disorders. Short, localized CNVs, which may play an important role in human disease, may be undetectable in noisy genomic data. Therefore, robust methodologies are needed for their detection. Furthermore, for meaningful identication of pathological CNVs, estimation of normal allelic aberrations is necessary. Results: We developed a signal processing-based methodology for sequence denoising followed by pattern matching, to increase SNR in genomic data and improve CNV detection. We applied this signal-decomposition-matched ltering (SDMF) methodology to 429 normal genomic sequences, and compared detected CNVs to those in the Database of Genomic Variants. SDMF successfully detected a signicant number of previously identied CNVs with frequencies of occurrence ≥10%, as well as unreported short CNVs. Its performance was also compared to circular binary segmentation (CBS). through simulations. SDMF had a signicantly lower false detection rate and was signicantly faster than CBS, an important advantage for handling large datasets generated with high-resolution arrays. By focusing on improving SNR (instead of the robustness of the detection algorithm), SDMF is a very promising methodology for identifying CNVs at all genomic spatial scales.

Original languageEnglish (US)
Article numberbtr402
Pages (from-to)2338-2345
Number of pages8
JournalBioinformatics
Volume27
Issue number17
DOIs
StatePublished - Sep 1 2011

Fingerprint

Human Genome
Signal Processing
Signal processing
Genome
Genes
Genomics
Genetic Heterogeneity
Pattern matching
Nucleotides
Polymorphism
Aberrations
Single Nucleotide Polymorphism
DNA
Databases
Decomposition
Methodology
Segmentation
Binary
Human
Single nucleotide Polymorphism

ASJC Scopus subject areas

  • Statistics and Probability
  • Medicine(all)
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

A novel signal processing approach for the detection of copy number variations in the human genome. / Stamoulis, Catherine; Betensky, Rebecca.

In: Bioinformatics, Vol. 27, No. 17, btr402, 01.09.2011, p. 2338-2345.

Research output: Contribution to journalArticle

@article{9335808819c44957b023b598c2ab933e,
title = "A novel signal processing approach for the detection of copy number variations in the human genome",
abstract = "Motivation: Human genomic variability occurs at different scales, from single nucleotide polymorphisms (SNPs) to large DNA segments. Copy number variations (CNVs) represent a signicant part of our genetic heterogeneity and have also been associated with many diseases and disorders. Short, localized CNVs, which may play an important role in human disease, may be undetectable in noisy genomic data. Therefore, robust methodologies are needed for their detection. Furthermore, for meaningful identication of pathological CNVs, estimation of normal allelic aberrations is necessary. Results: We developed a signal processing-based methodology for sequence denoising followed by pattern matching, to increase SNR in genomic data and improve CNV detection. We applied this signal-decomposition-matched ltering (SDMF) methodology to 429 normal genomic sequences, and compared detected CNVs to those in the Database of Genomic Variants. SDMF successfully detected a signicant number of previously identied CNVs with frequencies of occurrence ≥10{\%}, as well as unreported short CNVs. Its performance was also compared to circular binary segmentation (CBS). through simulations. SDMF had a signicantly lower false detection rate and was signicantly faster than CBS, an important advantage for handling large datasets generated with high-resolution arrays. By focusing on improving SNR (instead of the robustness of the detection algorithm), SDMF is a very promising methodology for identifying CNVs at all genomic spatial scales.",
author = "Catherine Stamoulis and Rebecca Betensky",
year = "2011",
month = "9",
day = "1",
doi = "10.1093/bioinformatics/btr402",
language = "English (US)",
volume = "27",
pages = "2338--2345",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "17",

}

TY - JOUR

T1 - A novel signal processing approach for the detection of copy number variations in the human genome

AU - Stamoulis, Catherine

AU - Betensky, Rebecca

PY - 2011/9/1

Y1 - 2011/9/1

N2 - Motivation: Human genomic variability occurs at different scales, from single nucleotide polymorphisms (SNPs) to large DNA segments. Copy number variations (CNVs) represent a signicant part of our genetic heterogeneity and have also been associated with many diseases and disorders. Short, localized CNVs, which may play an important role in human disease, may be undetectable in noisy genomic data. Therefore, robust methodologies are needed for their detection. Furthermore, for meaningful identication of pathological CNVs, estimation of normal allelic aberrations is necessary. Results: We developed a signal processing-based methodology for sequence denoising followed by pattern matching, to increase SNR in genomic data and improve CNV detection. We applied this signal-decomposition-matched ltering (SDMF) methodology to 429 normal genomic sequences, and compared detected CNVs to those in the Database of Genomic Variants. SDMF successfully detected a signicant number of previously identied CNVs with frequencies of occurrence ≥10%, as well as unreported short CNVs. Its performance was also compared to circular binary segmentation (CBS). through simulations. SDMF had a signicantly lower false detection rate and was signicantly faster than CBS, an important advantage for handling large datasets generated with high-resolution arrays. By focusing on improving SNR (instead of the robustness of the detection algorithm), SDMF is a very promising methodology for identifying CNVs at all genomic spatial scales.

AB - Motivation: Human genomic variability occurs at different scales, from single nucleotide polymorphisms (SNPs) to large DNA segments. Copy number variations (CNVs) represent a signicant part of our genetic heterogeneity and have also been associated with many diseases and disorders. Short, localized CNVs, which may play an important role in human disease, may be undetectable in noisy genomic data. Therefore, robust methodologies are needed for their detection. Furthermore, for meaningful identication of pathological CNVs, estimation of normal allelic aberrations is necessary. Results: We developed a signal processing-based methodology for sequence denoising followed by pattern matching, to increase SNR in genomic data and improve CNV detection. We applied this signal-decomposition-matched ltering (SDMF) methodology to 429 normal genomic sequences, and compared detected CNVs to those in the Database of Genomic Variants. SDMF successfully detected a signicant number of previously identied CNVs with frequencies of occurrence ≥10%, as well as unreported short CNVs. Its performance was also compared to circular binary segmentation (CBS). through simulations. SDMF had a signicantly lower false detection rate and was signicantly faster than CBS, an important advantage for handling large datasets generated with high-resolution arrays. By focusing on improving SNR (instead of the robustness of the detection algorithm), SDMF is a very promising methodology for identifying CNVs at all genomic spatial scales.

UR - http://www.scopus.com/inward/record.url?scp=80051923314&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80051923314&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btr402

DO - 10.1093/bioinformatics/btr402

M3 - Article

C2 - 21752800

AN - SCOPUS:80051923314

VL - 27

SP - 2338

EP - 2345

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 17

M1 - btr402

ER -