Efficient and exact maximum likelihood Quantisation of genomic features using dynamic programming

Mingzhou Song, Robert M. Haralick, Stephane Boissinot

Research output: Contribution to journalArticle

Abstract

An efficient and exact dynamic programming algorithm is introduced to quantise a continuous random variable into a discrete random variable that maximises the likelihood of the quantised probability distribution for the original continuous random variable. Quantisation is often useful before statistical analysis and modelling of large discrete network models from observations of multiple continuous random variables. The quantisation algorithm is applied to genomic features including the recombination rate distribution across the chromosomes and the non-coding transposable element LINE-1 in the human genome. The association pattern is studied between the recombination rate, obtained by quantisation at genomic locations around LINE-1 elements, and the length groups of LINE-1 elements, also obtained by quantisation on LINE-1 length. The exact and density-preserving quantisation approach provides an alternative superior to the inexact and distance-based univariate iterative k-means clustering algorithm for discretisation.

Original languageEnglish (US)
Pages (from-to)123-141
Number of pages19
JournalInternational Journal of Data Mining and Bioinformatics
Volume4
Issue number2
DOIs
StatePublished - Mar 1 2010

Fingerprint

Long Interspersed Nucleotide Elements
Dynamic programming
Random variables
Maximum likelihood
programming
Genetic Recombination
Alkali Metals
DNA Transposable Elements
Human Genome
statistical analysis
Cluster Analysis
Chromosomes
Clustering algorithms
Probability distributions
Statistical methods
Genes
Group

Keywords

  • Discretisation
  • Dynamic programming
  • LINE-1
  • Quantisation
  • Recombination rate distribution
  • Retrotransposon
  • Transposable elements

ASJC Scopus subject areas

  • Information Systems
  • Biochemistry, Genetics and Molecular Biology(all)
  • Library and Information Sciences

Cite this

Efficient and exact maximum likelihood Quantisation of genomic features using dynamic programming. / Song, Mingzhou; Haralick, Robert M.; Boissinot, Stephane.

In: International Journal of Data Mining and Bioinformatics, Vol. 4, No. 2, 01.03.2010, p. 123-141.

Research output: Contribution to journalArticle

@article{ed29f06b3d024480af3b63fec2d11cc9,
title = "Efficient and exact maximum likelihood Quantisation of genomic features using dynamic programming",
abstract = "An efficient and exact dynamic programming algorithm is introduced to quantise a continuous random variable into a discrete random variable that maximises the likelihood of the quantised probability distribution for the original continuous random variable. Quantisation is often useful before statistical analysis and modelling of large discrete network models from observations of multiple continuous random variables. The quantisation algorithm is applied to genomic features including the recombination rate distribution across the chromosomes and the non-coding transposable element LINE-1 in the human genome. The association pattern is studied between the recombination rate, obtained by quantisation at genomic locations around LINE-1 elements, and the length groups of LINE-1 elements, also obtained by quantisation on LINE-1 length. The exact and density-preserving quantisation approach provides an alternative superior to the inexact and distance-based univariate iterative k-means clustering algorithm for discretisation.",
keywords = "Discretisation, Dynamic programming, LINE-1, Quantisation, Recombination rate distribution, Retrotransposon, Transposable elements",
author = "Mingzhou Song and Haralick, {Robert M.} and Stephane Boissinot",
year = "2010",
month = "3",
day = "1",
doi = "10.1504/IJDMB.2010.032167",
language = "English (US)",
volume = "4",
pages = "123--141",
journal = "International Journal of Data Mining and Bioinformatics",
issn = "1748-5673",
publisher = "Inderscience Enterprises Ltd",
number = "2",

}

TY - JOUR

T1 - Efficient and exact maximum likelihood Quantisation of genomic features using dynamic programming

AU - Song, Mingzhou

AU - Haralick, Robert M.

AU - Boissinot, Stephane

PY - 2010/3/1

Y1 - 2010/3/1

N2 - An efficient and exact dynamic programming algorithm is introduced to quantise a continuous random variable into a discrete random variable that maximises the likelihood of the quantised probability distribution for the original continuous random variable. Quantisation is often useful before statistical analysis and modelling of large discrete network models from observations of multiple continuous random variables. The quantisation algorithm is applied to genomic features including the recombination rate distribution across the chromosomes and the non-coding transposable element LINE-1 in the human genome. The association pattern is studied between the recombination rate, obtained by quantisation at genomic locations around LINE-1 elements, and the length groups of LINE-1 elements, also obtained by quantisation on LINE-1 length. The exact and density-preserving quantisation approach provides an alternative superior to the inexact and distance-based univariate iterative k-means clustering algorithm for discretisation.

AB - An efficient and exact dynamic programming algorithm is introduced to quantise a continuous random variable into a discrete random variable that maximises the likelihood of the quantised probability distribution for the original continuous random variable. Quantisation is often useful before statistical analysis and modelling of large discrete network models from observations of multiple continuous random variables. The quantisation algorithm is applied to genomic features including the recombination rate distribution across the chromosomes and the non-coding transposable element LINE-1 in the human genome. The association pattern is studied between the recombination rate, obtained by quantisation at genomic locations around LINE-1 elements, and the length groups of LINE-1 elements, also obtained by quantisation on LINE-1 length. The exact and density-preserving quantisation approach provides an alternative superior to the inexact and distance-based univariate iterative k-means clustering algorithm for discretisation.

KW - Discretisation

KW - Dynamic programming

KW - LINE-1

KW - Quantisation

KW - Recombination rate distribution

KW - Retrotransposon

KW - Transposable elements

UR - http://www.scopus.com/inward/record.url?scp=77951444053&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77951444053&partnerID=8YFLogxK

U2 - 10.1504/IJDMB.2010.032167

DO - 10.1504/IJDMB.2010.032167

M3 - Article

VL - 4

SP - 123

EP - 141

JO - International Journal of Data Mining and Bioinformatics

JF - International Journal of Data Mining and Bioinformatics

SN - 1748-5673

IS - 2

ER -