A versatile statistical analysis algorithm to detect genome copy number variation

Raoul Sam Daruwala, Archisman Rudra, Harry Ostrer, Robert Lucito, Michael Wigler, Bhubaneswar Mishra

Research output: Contribution to journalArticle

Abstract

We have developed a versatile statistical analysis algorithm for the detection of genomic aberrations in human cancer cell lines. The algorithm analyzes genomic data obtained from a variety of array technologies, such as oligonucleotide array, bacterial artificial chromosome array, or array-based comparative genomic hybridization, that operate by hybridizing with genomic material obtained from cancer and normal cells and allow detection of regions of the genome with altered copy number. The number of probes (i.e., resolution), the amount of uncharacterized noise per probe, and the severity of chromosomal aberrations per chromosomal region may vary with the underlying technology, biological sample, and sample preparation. Constrained by these uncertainties, our algorithm aims at robustness by using a priorless maximum a posteriori estimator and at efficiency by a dynamic programming implementation. We illustrate these characteristics of our algorithm by applying it to data obtained from representational oligonucleotide microarray analysis and array-based comparative genomic hybridization technology as well as to synthetic data obtained from an artificial model whose properties can be varied computationally. The algorithm can combine data from multiple sources and thus facilitate the discovery of genes and markers important in cancer, as well as the discovery of loci important in inherited genetic disease.

Original languageEnglish (US)
Pages (from-to)16292-16297
Number of pages6
JournalProceedings of the National Academy of Sciences of the United States of America
Volume101
Issue number46
DOIs
StatePublished - Nov 16 2004

Fingerprint

Genome
Oligonucleotide Array Sequence Analysis
Comparative Genomic Hybridization
Technology
Bacterial Artificial Chromosomes
Neoplasms
Inborn Genetic Diseases
Information Storage and Retrieval
Genetic Association Studies
Microarray Analysis
Chromosome Aberrations
Uncertainty
Noise
Cell Line

Keywords

  • Array-based comparative genomic hybridization
  • Copy-number fluctuations
  • Maximum a posteriori estimator

ASJC Scopus subject areas

  • Genetics
  • General

Cite this

A versatile statistical analysis algorithm to detect genome copy number variation. / Daruwala, Raoul Sam; Rudra, Archisman; Ostrer, Harry; Lucito, Robert; Wigler, Michael; Mishra, Bhubaneswar.

In: Proceedings of the National Academy of Sciences of the United States of America, Vol. 101, No. 46, 16.11.2004, p. 16292-16297.

Research output: Contribution to journalArticle

Daruwala, Raoul Sam ; Rudra, Archisman ; Ostrer, Harry ; Lucito, Robert ; Wigler, Michael ; Mishra, Bhubaneswar. / A versatile statistical analysis algorithm to detect genome copy number variation. In: Proceedings of the National Academy of Sciences of the United States of America. 2004 ; Vol. 101, No. 46. pp. 16292-16297.
@article{0d5cf844c77e48e4af9b02881a1dcdd8,
title = "A versatile statistical analysis algorithm to detect genome copy number variation",
abstract = "We have developed a versatile statistical analysis algorithm for the detection of genomic aberrations in human cancer cell lines. The algorithm analyzes genomic data obtained from a variety of array technologies, such as oligonucleotide array, bacterial artificial chromosome array, or array-based comparative genomic hybridization, that operate by hybridizing with genomic material obtained from cancer and normal cells and allow detection of regions of the genome with altered copy number. The number of probes (i.e., resolution), the amount of uncharacterized noise per probe, and the severity of chromosomal aberrations per chromosomal region may vary with the underlying technology, biological sample, and sample preparation. Constrained by these uncertainties, our algorithm aims at robustness by using a priorless maximum a posteriori estimator and at efficiency by a dynamic programming implementation. We illustrate these characteristics of our algorithm by applying it to data obtained from representational oligonucleotide microarray analysis and array-based comparative genomic hybridization technology as well as to synthetic data obtained from an artificial model whose properties can be varied computationally. The algorithm can combine data from multiple sources and thus facilitate the discovery of genes and markers important in cancer, as well as the discovery of loci important in inherited genetic disease.",
keywords = "Array-based comparative genomic hybridization, Copy-number fluctuations, Maximum a posteriori estimator",
author = "Daruwala, {Raoul Sam} and Archisman Rudra and Harry Ostrer and Robert Lucito and Michael Wigler and Bhubaneswar Mishra",
year = "2004",
month = "11",
day = "16",
doi = "10.1073/pnas.0407247101",
language = "English (US)",
volume = "101",
pages = "16292--16297",
journal = "Proceedings of the National Academy of Sciences of the United States of America",
issn = "0027-8424",
number = "46",

}

TY - JOUR

T1 - A versatile statistical analysis algorithm to detect genome copy number variation

AU - Daruwala, Raoul Sam

AU - Rudra, Archisman

AU - Ostrer, Harry

AU - Lucito, Robert

AU - Wigler, Michael

AU - Mishra, Bhubaneswar

PY - 2004/11/16

Y1 - 2004/11/16

N2 - We have developed a versatile statistical analysis algorithm for the detection of genomic aberrations in human cancer cell lines. The algorithm analyzes genomic data obtained from a variety of array technologies, such as oligonucleotide array, bacterial artificial chromosome array, or array-based comparative genomic hybridization, that operate by hybridizing with genomic material obtained from cancer and normal cells and allow detection of regions of the genome with altered copy number. The number of probes (i.e., resolution), the amount of uncharacterized noise per probe, and the severity of chromosomal aberrations per chromosomal region may vary with the underlying technology, biological sample, and sample preparation. Constrained by these uncertainties, our algorithm aims at robustness by using a priorless maximum a posteriori estimator and at efficiency by a dynamic programming implementation. We illustrate these characteristics of our algorithm by applying it to data obtained from representational oligonucleotide microarray analysis and array-based comparative genomic hybridization technology as well as to synthetic data obtained from an artificial model whose properties can be varied computationally. The algorithm can combine data from multiple sources and thus facilitate the discovery of genes and markers important in cancer, as well as the discovery of loci important in inherited genetic disease.

AB - We have developed a versatile statistical analysis algorithm for the detection of genomic aberrations in human cancer cell lines. The algorithm analyzes genomic data obtained from a variety of array technologies, such as oligonucleotide array, bacterial artificial chromosome array, or array-based comparative genomic hybridization, that operate by hybridizing with genomic material obtained from cancer and normal cells and allow detection of regions of the genome with altered copy number. The number of probes (i.e., resolution), the amount of uncharacterized noise per probe, and the severity of chromosomal aberrations per chromosomal region may vary with the underlying technology, biological sample, and sample preparation. Constrained by these uncertainties, our algorithm aims at robustness by using a priorless maximum a posteriori estimator and at efficiency by a dynamic programming implementation. We illustrate these characteristics of our algorithm by applying it to data obtained from representational oligonucleotide microarray analysis and array-based comparative genomic hybridization technology as well as to synthetic data obtained from an artificial model whose properties can be varied computationally. The algorithm can combine data from multiple sources and thus facilitate the discovery of genes and markers important in cancer, as well as the discovery of loci important in inherited genetic disease.

KW - Array-based comparative genomic hybridization

KW - Copy-number fluctuations

KW - Maximum a posteriori estimator

UR - http://www.scopus.com/inward/record.url?scp=9244233813&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=9244233813&partnerID=8YFLogxK

U2 - 10.1073/pnas.0407247101

DO - 10.1073/pnas.0407247101

M3 - Article

VL - 101

SP - 16292

EP - 16297

JO - Proceedings of the National Academy of Sciences of the United States of America

JF - Proceedings of the National Academy of Sciences of the United States of America

SN - 0027-8424

IS - 46

ER -