Placing probes along the genome using pairwise distance data

Will Casey, Bhubaneswar Mishra, Mike Wigler

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We describe the theoretical basis of an approach using microarrays of probes and libraries of BACs to construct maps of the probes, by assigning relative locations to the probes along the genome. The method depends on several hybridization experiments: in each experiment, we sample (with replacement) a large library of BACs to select a small collection of BACs for hybridization with the probe arrays. The resulting data can be used to assign a local distance metric relating the arrayed probes, and then to position the probes with respect to each other. The method is shown to be capable of achieving surprisingly high accuracy within individual contigs and with less than 100 microarray hybridization experiments even when the probes and clones number about 105, thus involving potentially around 1010 individual hybridizations. This approach is not dependent upon existing BAC contig information, and so should be particularly useful in the application to previously uncharacterized genomes. Nevertheless, the method may be used to independently validate a BAC contig map or a minimal tiling path obtained by intensive genomic sequence determination. We provide a detailed probabilistic analysis to characterize the outcome of a single hybridization experiment and what information can be garnered about the physical distance between any pair of probes. This analysis then leads to a formulation of a likelihood optimization problem whose solution leads to the relative probe locations. After reformulating the optimization problem in a graph-theoretic setting and by exploiting the underlying probabilistic structure, we develop an efficient approximation algorithm for our original problem. We have implemented the algorithm and conducted several experiments for varied sets of parameters. Our empirical results are highly promising and are reported here as well. We also explore how the probabilistic analysis and algorithmic efficiency issues affect the design of the underlying biochemical experiments.

Original languageEnglish (US)
Title of host publicationAlgorithms in Bioinformatics - First International Workshop, WABI 2001 Århus Denmark, August 28-31, 2001 Proceedings
PublisherSpringer Verlag
Pages52-68
Number of pages17
Volume2149
ISBN (Print)3540425160
DOIs
StatePublished - 2001
Event1st International Workshop on Algorithms in Bioinformatics, WABI 2001 - Arhus, Denmark
Duration: Aug 28 2001Aug 31 2001

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2149
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other1st International Workshop on Algorithms in Bioinformatics, WABI 2001
CountryDenmark
CityArhus
Period8/28/018/31/01

Fingerprint

Pairwise
Genome
Probe
Genes
Experiment
Probabilistic Analysis
Experiments
Microarrays
Microarray
Optimization Problem
Distance Metric
Approximation algorithms
Tiling
Clone
Replacement
Genomics
Assign
Approximation Algorithms
Likelihood
High Accuracy

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Casey, W., Mishra, B., & Wigler, M. (2001). Placing probes along the genome using pairwise distance data. In Algorithms in Bioinformatics - First International Workshop, WABI 2001 Århus Denmark, August 28-31, 2001 Proceedings (Vol. 2149, pp. 52-68). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2149). Springer Verlag. https://doi.org/10.1007/3-540-44696-6_5

Placing probes along the genome using pairwise distance data. / Casey, Will; Mishra, Bhubaneswar; Wigler, Mike.

Algorithms in Bioinformatics - First International Workshop, WABI 2001 Århus Denmark, August 28-31, 2001 Proceedings. Vol. 2149 Springer Verlag, 2001. p. 52-68 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2149).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Casey, W, Mishra, B & Wigler, M 2001, Placing probes along the genome using pairwise distance data. in Algorithms in Bioinformatics - First International Workshop, WABI 2001 Århus Denmark, August 28-31, 2001 Proceedings. vol. 2149, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 2149, Springer Verlag, pp. 52-68, 1st International Workshop on Algorithms in Bioinformatics, WABI 2001, Arhus, Denmark, 8/28/01. https://doi.org/10.1007/3-540-44696-6_5
Casey W, Mishra B, Wigler M. Placing probes along the genome using pairwise distance data. In Algorithms in Bioinformatics - First International Workshop, WABI 2001 Århus Denmark, August 28-31, 2001 Proceedings. Vol. 2149. Springer Verlag. 2001. p. 52-68. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/3-540-44696-6_5
Casey, Will ; Mishra, Bhubaneswar ; Wigler, Mike. / Placing probes along the genome using pairwise distance data. Algorithms in Bioinformatics - First International Workshop, WABI 2001 Århus Denmark, August 28-31, 2001 Proceedings. Vol. 2149 Springer Verlag, 2001. pp. 52-68 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{fe45369c79054b86b53d2a303b07f6cf,
title = "Placing probes along the genome using pairwise distance data",
abstract = "We describe the theoretical basis of an approach using microarrays of probes and libraries of BACs to construct maps of the probes, by assigning relative locations to the probes along the genome. The method depends on several hybridization experiments: in each experiment, we sample (with replacement) a large library of BACs to select a small collection of BACs for hybridization with the probe arrays. The resulting data can be used to assign a local distance metric relating the arrayed probes, and then to position the probes with respect to each other. The method is shown to be capable of achieving surprisingly high accuracy within individual contigs and with less than 100 microarray hybridization experiments even when the probes and clones number about 105, thus involving potentially around 1010 individual hybridizations. This approach is not dependent upon existing BAC contig information, and so should be particularly useful in the application to previously uncharacterized genomes. Nevertheless, the method may be used to independently validate a BAC contig map or a minimal tiling path obtained by intensive genomic sequence determination. We provide a detailed probabilistic analysis to characterize the outcome of a single hybridization experiment and what information can be garnered about the physical distance between any pair of probes. This analysis then leads to a formulation of a likelihood optimization problem whose solution leads to the relative probe locations. After reformulating the optimization problem in a graph-theoretic setting and by exploiting the underlying probabilistic structure, we develop an efficient approximation algorithm for our original problem. We have implemented the algorithm and conducted several experiments for varied sets of parameters. Our empirical results are highly promising and are reported here as well. We also explore how the probabilistic analysis and algorithmic efficiency issues affect the design of the underlying biochemical experiments.",
author = "Will Casey and Bhubaneswar Mishra and Mike Wigler",
year = "2001",
doi = "10.1007/3-540-44696-6_5",
language = "English (US)",
isbn = "3540425160",
volume = "2149",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "52--68",
booktitle = "Algorithms in Bioinformatics - First International Workshop, WABI 2001 {\AA}rhus Denmark, August 28-31, 2001 Proceedings",

}

TY - GEN

T1 - Placing probes along the genome using pairwise distance data

AU - Casey, Will

AU - Mishra, Bhubaneswar

AU - Wigler, Mike

PY - 2001

Y1 - 2001

N2 - We describe the theoretical basis of an approach using microarrays of probes and libraries of BACs to construct maps of the probes, by assigning relative locations to the probes along the genome. The method depends on several hybridization experiments: in each experiment, we sample (with replacement) a large library of BACs to select a small collection of BACs for hybridization with the probe arrays. The resulting data can be used to assign a local distance metric relating the arrayed probes, and then to position the probes with respect to each other. The method is shown to be capable of achieving surprisingly high accuracy within individual contigs and with less than 100 microarray hybridization experiments even when the probes and clones number about 105, thus involving potentially around 1010 individual hybridizations. This approach is not dependent upon existing BAC contig information, and so should be particularly useful in the application to previously uncharacterized genomes. Nevertheless, the method may be used to independently validate a BAC contig map or a minimal tiling path obtained by intensive genomic sequence determination. We provide a detailed probabilistic analysis to characterize the outcome of a single hybridization experiment and what information can be garnered about the physical distance between any pair of probes. This analysis then leads to a formulation of a likelihood optimization problem whose solution leads to the relative probe locations. After reformulating the optimization problem in a graph-theoretic setting and by exploiting the underlying probabilistic structure, we develop an efficient approximation algorithm for our original problem. We have implemented the algorithm and conducted several experiments for varied sets of parameters. Our empirical results are highly promising and are reported here as well. We also explore how the probabilistic analysis and algorithmic efficiency issues affect the design of the underlying biochemical experiments.

AB - We describe the theoretical basis of an approach using microarrays of probes and libraries of BACs to construct maps of the probes, by assigning relative locations to the probes along the genome. The method depends on several hybridization experiments: in each experiment, we sample (with replacement) a large library of BACs to select a small collection of BACs for hybridization with the probe arrays. The resulting data can be used to assign a local distance metric relating the arrayed probes, and then to position the probes with respect to each other. The method is shown to be capable of achieving surprisingly high accuracy within individual contigs and with less than 100 microarray hybridization experiments even when the probes and clones number about 105, thus involving potentially around 1010 individual hybridizations. This approach is not dependent upon existing BAC contig information, and so should be particularly useful in the application to previously uncharacterized genomes. Nevertheless, the method may be used to independently validate a BAC contig map or a minimal tiling path obtained by intensive genomic sequence determination. We provide a detailed probabilistic analysis to characterize the outcome of a single hybridization experiment and what information can be garnered about the physical distance between any pair of probes. This analysis then leads to a formulation of a likelihood optimization problem whose solution leads to the relative probe locations. After reformulating the optimization problem in a graph-theoretic setting and by exploiting the underlying probabilistic structure, we develop an efficient approximation algorithm for our original problem. We have implemented the algorithm and conducted several experiments for varied sets of parameters. Our empirical results are highly promising and are reported here as well. We also explore how the probabilistic analysis and algorithmic efficiency issues affect the design of the underlying biochemical experiments.

UR - http://www.scopus.com/inward/record.url?scp=33644863597&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33644863597&partnerID=8YFLogxK

U2 - 10.1007/3-540-44696-6_5

DO - 10.1007/3-540-44696-6_5

M3 - Conference contribution

SN - 3540425160

VL - 2149

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 52

EP - 68

BT - Algorithms in Bioinformatics - First International Workshop, WABI 2001 Århus Denmark, August 28-31, 2001 Proceedings

PB - Springer Verlag

ER -