Variable importance in matched case-control studies in settings of high dimensional data

Raji Balasubramanian, E. Andres Houseman, Brent A. Coull, Michael H. Lev, Lee H. Schwamm, Rebecca Betensky

Research output: Contribution to journalArticle

Abstract

Summary: We propose a method for assessing variable importance in matched case-control investigations and other highly stratified studies characterized by high dimensional data (p>>n). In simulated and real data sets, we show that the algorithm proposed performs better than a conventional univariate method (conditional logistic regression) and a popular multivariable algorithm (random forests) that does not take the matching into account. The methods are applicable to wide ranging, high impact clinical studies including metabolomic, proteomic studies and neuroimaging analyses, such as those assessing stroke and Alzheimer's disease. The methods proposed have been implemented in a freely available R library (http://cran.r-project.org/web/packages/RPCLR/index.html).

Original languageEnglish (US)
Pages (from-to)639-655
Number of pages17
JournalJournal of the Royal Statistical Society. Series C: Applied Statistics
Volume63
Issue number4
DOIs
StatePublished - Jan 1 2014

Fingerprint

Matched Case-control Study
High-dimensional Data
Conditional Logistic Regression
Metabolomics
Neuroimaging
Case-control
Alzheimer's Disease
Random Forest
Proteomics
Stroke
Univariate

Keywords

  • Data mining
  • High dimensional data
  • Matched case-control studies

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

Variable importance in matched case-control studies in settings of high dimensional data. / Balasubramanian, Raji; Andres Houseman, E.; Coull, Brent A.; Lev, Michael H.; Schwamm, Lee H.; Betensky, Rebecca.

In: Journal of the Royal Statistical Society. Series C: Applied Statistics, Vol. 63, No. 4, 01.01.2014, p. 639-655.

Research output: Contribution to journalArticle

Balasubramanian, Raji ; Andres Houseman, E. ; Coull, Brent A. ; Lev, Michael H. ; Schwamm, Lee H. ; Betensky, Rebecca. / Variable importance in matched case-control studies in settings of high dimensional data. In: Journal of the Royal Statistical Society. Series C: Applied Statistics. 2014 ; Vol. 63, No. 4. pp. 639-655.
@article{40ea92c379664ebe8deafea7a6fdd68f,
title = "Variable importance in matched case-control studies in settings of high dimensional data",
abstract = "Summary: We propose a method for assessing variable importance in matched case-control investigations and other highly stratified studies characterized by high dimensional data (p>>n). In simulated and real data sets, we show that the algorithm proposed performs better than a conventional univariate method (conditional logistic regression) and a popular multivariable algorithm (random forests) that does not take the matching into account. The methods are applicable to wide ranging, high impact clinical studies including metabolomic, proteomic studies and neuroimaging analyses, such as those assessing stroke and Alzheimer's disease. The methods proposed have been implemented in a freely available R library (http://cran.r-project.org/web/packages/RPCLR/index.html).",
keywords = "Data mining, High dimensional data, Matched case-control studies",
author = "Raji Balasubramanian and {Andres Houseman}, E. and Coull, {Brent A.} and Lev, {Michael H.} and Schwamm, {Lee H.} and Rebecca Betensky",
year = "2014",
month = "1",
day = "1",
doi = "10.1111/rssc.12056",
language = "English (US)",
volume = "63",
pages = "639--655",
journal = "Journal of the Royal Statistical Society. Series C: Applied Statistics",
issn = "0035-9254",
publisher = "Wiley-Blackwell",
number = "4",

}

TY - JOUR

T1 - Variable importance in matched case-control studies in settings of high dimensional data

AU - Balasubramanian, Raji

AU - Andres Houseman, E.

AU - Coull, Brent A.

AU - Lev, Michael H.

AU - Schwamm, Lee H.

AU - Betensky, Rebecca

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Summary: We propose a method for assessing variable importance in matched case-control investigations and other highly stratified studies characterized by high dimensional data (p>>n). In simulated and real data sets, we show that the algorithm proposed performs better than a conventional univariate method (conditional logistic regression) and a popular multivariable algorithm (random forests) that does not take the matching into account. The methods are applicable to wide ranging, high impact clinical studies including metabolomic, proteomic studies and neuroimaging analyses, such as those assessing stroke and Alzheimer's disease. The methods proposed have been implemented in a freely available R library (http://cran.r-project.org/web/packages/RPCLR/index.html).

AB - Summary: We propose a method for assessing variable importance in matched case-control investigations and other highly stratified studies characterized by high dimensional data (p>>n). In simulated and real data sets, we show that the algorithm proposed performs better than a conventional univariate method (conditional logistic regression) and a popular multivariable algorithm (random forests) that does not take the matching into account. The methods are applicable to wide ranging, high impact clinical studies including metabolomic, proteomic studies and neuroimaging analyses, such as those assessing stroke and Alzheimer's disease. The methods proposed have been implemented in a freely available R library (http://cran.r-project.org/web/packages/RPCLR/index.html).

KW - Data mining

KW - High dimensional data

KW - Matched case-control studies

UR - http://www.scopus.com/inward/record.url?scp=84905032278&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84905032278&partnerID=8YFLogxK

U2 - 10.1111/rssc.12056

DO - 10.1111/rssc.12056

M3 - Article

VL - 63

SP - 639

EP - 655

JO - Journal of the Royal Statistical Society. Series C: Applied Statistics

JF - Journal of the Royal Statistical Society. Series C: Applied Statistics

SN - 0035-9254

IS - 4

ER -