Feature-specific penalized latent class analysis for genomic data

E. Andrés Houseman, Brent A. Coull, Rebecca Betensky

Research output: Contribution to journalArticle

Abstract

Genomic data are often characterized by a moderate to large number of categorical variables observed for relatively few subjects. Some of the variables may be missing or noninformative. An example of such data is loss of heterozygosity (LOH), a dichotomous variable, observed on a moderate number of genetic markers. We first consider a latent class model where, conditional on unobserved membership in one of k classes, the variables are independent with probabilities determined by a regression model of low dimension q. Using a family of penalties including the ridge and LASSO, we extend this model to address higher-dimensional problems. Finally, we present an orthogonal map that transforms marker space to a space of "features" for which the constrained model has better predictive power. We demonstrate these methods on LOH data collected at 19 markers from 93 brain tumor patients. For this data set, the existing unpenalized latent class methodology does not produce estimates. Additionally, we show that posterior classes obtained from this method are associated with survival for these patients.

Original languageEnglish (US)
JournalBiometrics
Volume62
Issue number4
DOIs
StatePublished - Dec 1 2006

Fingerprint

Latent Class Analysis
Loss of Heterozygosity
Genomics
genomics
Genetic Markers
Brain Neoplasms
heterozygosity
Brain Tumor
Latent Class
Latent Class Model
Categorical variable
Survival
Ridge
Penalty
Tumors
Brain
Regression Model
High-dimensional
methodology
Transform

Keywords

  • Constrained estimation
  • LASSO
  • Loss of heterozygosity
  • Mixture models
  • Penalized likelihood
  • Ridge regression

ASJC Scopus subject areas

  • Statistics and Probability
  • Medicine(all)
  • Immunology and Microbiology(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Agricultural and Biological Sciences(all)
  • Applied Mathematics

Cite this

Feature-specific penalized latent class analysis for genomic data. / Houseman, E. Andrés; Coull, Brent A.; Betensky, Rebecca.

In: Biometrics, Vol. 62, No. 4, 01.12.2006.

Research output: Contribution to journalArticle

Houseman, E. Andrés ; Coull, Brent A. ; Betensky, Rebecca. / Feature-specific penalized latent class analysis for genomic data. In: Biometrics. 2006 ; Vol. 62, No. 4.
@article{226b7063979c478f9e9f2ce7a000fa42,
title = "Feature-specific penalized latent class analysis for genomic data",
abstract = "Genomic data are often characterized by a moderate to large number of categorical variables observed for relatively few subjects. Some of the variables may be missing or noninformative. An example of such data is loss of heterozygosity (LOH), a dichotomous variable, observed on a moderate number of genetic markers. We first consider a latent class model where, conditional on unobserved membership in one of k classes, the variables are independent with probabilities determined by a regression model of low dimension q. Using a family of penalties including the ridge and LASSO, we extend this model to address higher-dimensional problems. Finally, we present an orthogonal map that transforms marker space to a space of {"}features{"} for which the constrained model has better predictive power. We demonstrate these methods on LOH data collected at 19 markers from 93 brain tumor patients. For this data set, the existing unpenalized latent class methodology does not produce estimates. Additionally, we show that posterior classes obtained from this method are associated with survival for these patients.",
keywords = "Constrained estimation, LASSO, Loss of heterozygosity, Mixture models, Penalized likelihood, Ridge regression",
author = "Houseman, {E. Andr{\'e}s} and Coull, {Brent A.} and Rebecca Betensky",
year = "2006",
month = "12",
day = "1",
doi = "10.1111/j.1541-0420.2006.00566.x",
language = "English (US)",
volume = "62",
journal = "Biometrics",
issn = "0006-341X",
publisher = "Wiley-Blackwell",
number = "4",

}

TY - JOUR

T1 - Feature-specific penalized latent class analysis for genomic data

AU - Houseman, E. Andrés

AU - Coull, Brent A.

AU - Betensky, Rebecca

PY - 2006/12/1

Y1 - 2006/12/1

N2 - Genomic data are often characterized by a moderate to large number of categorical variables observed for relatively few subjects. Some of the variables may be missing or noninformative. An example of such data is loss of heterozygosity (LOH), a dichotomous variable, observed on a moderate number of genetic markers. We first consider a latent class model where, conditional on unobserved membership in one of k classes, the variables are independent with probabilities determined by a regression model of low dimension q. Using a family of penalties including the ridge and LASSO, we extend this model to address higher-dimensional problems. Finally, we present an orthogonal map that transforms marker space to a space of "features" for which the constrained model has better predictive power. We demonstrate these methods on LOH data collected at 19 markers from 93 brain tumor patients. For this data set, the existing unpenalized latent class methodology does not produce estimates. Additionally, we show that posterior classes obtained from this method are associated with survival for these patients.

AB - Genomic data are often characterized by a moderate to large number of categorical variables observed for relatively few subjects. Some of the variables may be missing or noninformative. An example of such data is loss of heterozygosity (LOH), a dichotomous variable, observed on a moderate number of genetic markers. We first consider a latent class model where, conditional on unobserved membership in one of k classes, the variables are independent with probabilities determined by a regression model of low dimension q. Using a family of penalties including the ridge and LASSO, we extend this model to address higher-dimensional problems. Finally, we present an orthogonal map that transforms marker space to a space of "features" for which the constrained model has better predictive power. We demonstrate these methods on LOH data collected at 19 markers from 93 brain tumor patients. For this data set, the existing unpenalized latent class methodology does not produce estimates. Additionally, we show that posterior classes obtained from this method are associated with survival for these patients.

KW - Constrained estimation

KW - LASSO

KW - Loss of heterozygosity

KW - Mixture models

KW - Penalized likelihood

KW - Ridge regression

UR - http://www.scopus.com/inward/record.url?scp=33751261808&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33751261808&partnerID=8YFLogxK

U2 - 10.1111/j.1541-0420.2006.00566.x

DO - 10.1111/j.1541-0420.2006.00566.x

M3 - Article

VL - 62

JO - Biometrics

JF - Biometrics

SN - 0006-341X

IS - 4

ER -