Variable selection and prediction using a nested, matched case-control study: Application to hospital acquired pneumonia in stroke patients

Jing Qian, Seyedmehdi Payabvash, André Kemmling, Michael H. Lev, Lee H. Schwamm, Rebecca Betensky

Research output: Contribution to journalArticle

Abstract

Matched case-control designs are commonly used in epidemiologic studies for increased efficiency. These designs have recently been introduced to the setting of modern imaging and genomic studies, which are characterized by high-dimensional covariates. However, appropriate statistical analyses that adjust for the matching have not been widely adopted. A matched case-control study of 430 acute ischemic stroke patients was conducted at Massachusetts General Hospital (MGH) in order to identify specific brain regions of acute infarction that are associated with hospital acquired pneumonia (HAP) in these patients. There are 138 brain regions in which infarction was measured, which introduce nearly 10,000 two-way interactions, and challenge the statistical analysis. We investigate penalized conditional and unconditional logistic regression approaches to this variable selection problem that properly differentiate between selection of main effects and of interactions, and that acknowledge the matching. This neuroimaging study was nested within a larger prospective study of HAP in 1915 stroke patients at MGH, which recorded clinical variables, but did not include neuroimaging. We demonstrate how the larger study, in conjunction with the nested, matched study, affords us the capability to derive a score for prediction of HAP in future stroke patients based on imaging and clinical features. We evaluate the proposed methods in simulation studies and we apply them to the MGH HAP study.

Original languageEnglish (US)
Pages (from-to)153-163
Number of pages11
JournalBiometrics
Volume70
Issue number1
DOIs
StatePublished - Jan 1 2014

Fingerprint

Matched Case-control Study
Variable Selection
case-control studies
Stroke
stroke
pneumonia
Case-Control Studies
Pneumonia
General Hospitals
prediction
Prediction
Neuroimaging
Infarction
infarction
Brain
Acute
Epidemiologic Studies
Imaging
image analysis
Logistic Models

Keywords

  • AUC
  • Cerebral infarction
  • Conditional logistic regression
  • Elastic net
  • Lasso
  • Penalized likelihood
  • ROC analysis

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry, Genetics and Molecular Biology(all)
  • Immunology and Microbiology(all)
  • Agricultural and Biological Sciences(all)
  • Applied Mathematics

Cite this

Variable selection and prediction using a nested, matched case-control study : Application to hospital acquired pneumonia in stroke patients. / Qian, Jing; Payabvash, Seyedmehdi; Kemmling, André; Lev, Michael H.; Schwamm, Lee H.; Betensky, Rebecca.

In: Biometrics, Vol. 70, No. 1, 01.01.2014, p. 153-163.

Research output: Contribution to journalArticle

Qian, Jing ; Payabvash, Seyedmehdi ; Kemmling, André ; Lev, Michael H. ; Schwamm, Lee H. ; Betensky, Rebecca. / Variable selection and prediction using a nested, matched case-control study : Application to hospital acquired pneumonia in stroke patients. In: Biometrics. 2014 ; Vol. 70, No. 1. pp. 153-163.
@article{86b3e1e85fac4a68b4f6b54de270f46a,
title = "Variable selection and prediction using a nested, matched case-control study: Application to hospital acquired pneumonia in stroke patients",
abstract = "Matched case-control designs are commonly used in epidemiologic studies for increased efficiency. These designs have recently been introduced to the setting of modern imaging and genomic studies, which are characterized by high-dimensional covariates. However, appropriate statistical analyses that adjust for the matching have not been widely adopted. A matched case-control study of 430 acute ischemic stroke patients was conducted at Massachusetts General Hospital (MGH) in order to identify specific brain regions of acute infarction that are associated with hospital acquired pneumonia (HAP) in these patients. There are 138 brain regions in which infarction was measured, which introduce nearly 10,000 two-way interactions, and challenge the statistical analysis. We investigate penalized conditional and unconditional logistic regression approaches to this variable selection problem that properly differentiate between selection of main effects and of interactions, and that acknowledge the matching. This neuroimaging study was nested within a larger prospective study of HAP in 1915 stroke patients at MGH, which recorded clinical variables, but did not include neuroimaging. We demonstrate how the larger study, in conjunction with the nested, matched study, affords us the capability to derive a score for prediction of HAP in future stroke patients based on imaging and clinical features. We evaluate the proposed methods in simulation studies and we apply them to the MGH HAP study.",
keywords = "AUC, Cerebral infarction, Conditional logistic regression, Elastic net, Lasso, Penalized likelihood, ROC analysis",
author = "Jing Qian and Seyedmehdi Payabvash and Andr{\'e} Kemmling and Lev, {Michael H.} and Schwamm, {Lee H.} and Rebecca Betensky",
year = "2014",
month = "1",
day = "1",
doi = "10.1111/biom.12113",
language = "English (US)",
volume = "70",
pages = "153--163",
journal = "Biometrics",
issn = "0006-341X",
publisher = "Wiley-Blackwell",
number = "1",

}

TY - JOUR

T1 - Variable selection and prediction using a nested, matched case-control study

T2 - Application to hospital acquired pneumonia in stroke patients

AU - Qian, Jing

AU - Payabvash, Seyedmehdi

AU - Kemmling, André

AU - Lev, Michael H.

AU - Schwamm, Lee H.

AU - Betensky, Rebecca

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Matched case-control designs are commonly used in epidemiologic studies for increased efficiency. These designs have recently been introduced to the setting of modern imaging and genomic studies, which are characterized by high-dimensional covariates. However, appropriate statistical analyses that adjust for the matching have not been widely adopted. A matched case-control study of 430 acute ischemic stroke patients was conducted at Massachusetts General Hospital (MGH) in order to identify specific brain regions of acute infarction that are associated with hospital acquired pneumonia (HAP) in these patients. There are 138 brain regions in which infarction was measured, which introduce nearly 10,000 two-way interactions, and challenge the statistical analysis. We investigate penalized conditional and unconditional logistic regression approaches to this variable selection problem that properly differentiate between selection of main effects and of interactions, and that acknowledge the matching. This neuroimaging study was nested within a larger prospective study of HAP in 1915 stroke patients at MGH, which recorded clinical variables, but did not include neuroimaging. We demonstrate how the larger study, in conjunction with the nested, matched study, affords us the capability to derive a score for prediction of HAP in future stroke patients based on imaging and clinical features. We evaluate the proposed methods in simulation studies and we apply them to the MGH HAP study.

AB - Matched case-control designs are commonly used in epidemiologic studies for increased efficiency. These designs have recently been introduced to the setting of modern imaging and genomic studies, which are characterized by high-dimensional covariates. However, appropriate statistical analyses that adjust for the matching have not been widely adopted. A matched case-control study of 430 acute ischemic stroke patients was conducted at Massachusetts General Hospital (MGH) in order to identify specific brain regions of acute infarction that are associated with hospital acquired pneumonia (HAP) in these patients. There are 138 brain regions in which infarction was measured, which introduce nearly 10,000 two-way interactions, and challenge the statistical analysis. We investigate penalized conditional and unconditional logistic regression approaches to this variable selection problem that properly differentiate between selection of main effects and of interactions, and that acknowledge the matching. This neuroimaging study was nested within a larger prospective study of HAP in 1915 stroke patients at MGH, which recorded clinical variables, but did not include neuroimaging. We demonstrate how the larger study, in conjunction with the nested, matched study, affords us the capability to derive a score for prediction of HAP in future stroke patients based on imaging and clinical features. We evaluate the proposed methods in simulation studies and we apply them to the MGH HAP study.

KW - AUC

KW - Cerebral infarction

KW - Conditional logistic regression

KW - Elastic net

KW - Lasso

KW - Penalized likelihood

KW - ROC analysis

UR - http://www.scopus.com/inward/record.url?scp=84895894429&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84895894429&partnerID=8YFLogxK

U2 - 10.1111/biom.12113

DO - 10.1111/biom.12113

M3 - Article

C2 - 24320930

AN - SCOPUS:84895894429

VL - 70

SP - 153

EP - 163

JO - Biometrics

JF - Biometrics

SN - 0006-341X

IS - 1

ER -