Machine learning selected smoking-associated DNA methylation signatures that predict HIV prognosis and mortality

Xinyu Zhang, Ying Hu, Bradley Aouizerat, Gang Peng, Vincent C. Marconi, Michael J. Corley, Todd Hulgan, Kendall J. Bryant, Hongyu Zhao, John H. Krystal, Amy C. Justice, Ke Xu

Research output: Contribution to journalArticle

Abstract

Background: The effects of tobacco smoking on epigenome-wide methylation signatures in white blood cells (WBCs) collected from persons living with HIV may have important implications for their immune-related outcomes, including frailty and mortality. The application of a machine learning approach to the analysis of CpG methylation in the epigenome enables the selection of phenotypically relevant features from high-dimensional data. Using this approach, we now report that a set of smoking-associated DNA-methylated CpGs predicts HIV prognosis and mortality in an HIV-positive veteran population. Results: We first identified 137 epigenome-wide significant CpGs for smoking in WBCs from 1137 HIV-positive individuals (p < 1.70E-07). To examine whether smoking-associated CpGs were predictive of HIV frailty and mortality, we applied ensemble-based machine learning to build a model in a training sample employing 408,583 CpGs. A set of 698 CpGs was selected and predictive of high HIV frailty in a testing sample [(area under curve (AUC) = 0.73, 95%CI 0.63~0.83)] and was replicated in an independent sample [(AUC = 0.78, 95%CI 0.73~0.83)]. We further found an association of a DNA methylation index constructed from the 698 CpGs that were associated with a 5-year survival rate [HR = 1.46; 95%CI 1.06~2.02, p = 0.02]. Interestingly, the 698 CpGs located on 445 genes were enriched on the integrin signaling pathway (p = 9.55E-05, false discovery rate = 0.036), which is responsible for the regulation of the cell cycle, differentiation, and adhesion. Conclusion: We demonstrated that smoking-associated DNA methylation features in white blood cells predict HIV infection-related clinical outcomes in a population living with HIV.

Original languageEnglish (US)
Article number155
JournalClinical Epigenetics
Volume10
Issue number1
DOIs
StatePublished - Dec 13 2018

Fingerprint

DNA Methylation
Smoking
HIV
Mortality
Leukocytes
Methylation
Area Under Curve
Veterans
Machine Learning
Cell Adhesion
Integrins
Population
HIV Infections
Cell Differentiation
Cell Cycle
Survival Rate
DNA
Genes

Keywords

  • DNA methylation
  • Ensemble machine learning
  • HIV frailty
  • Mortality
  • Tobacco smoking

ASJC Scopus subject areas

  • Molecular Biology
  • Genetics
  • Developmental Biology
  • Genetics(clinical)

Cite this

Machine learning selected smoking-associated DNA methylation signatures that predict HIV prognosis and mortality. / Zhang, Xinyu; Hu, Ying; Aouizerat, Bradley; Peng, Gang; Marconi, Vincent C.; Corley, Michael J.; Hulgan, Todd; Bryant, Kendall J.; Zhao, Hongyu; Krystal, John H.; Justice, Amy C.; Xu, Ke.

In: Clinical Epigenetics, Vol. 10, No. 1, 155, 13.12.2018.

Research output: Contribution to journalArticle

Zhang, X, Hu, Y, Aouizerat, B, Peng, G, Marconi, VC, Corley, MJ, Hulgan, T, Bryant, KJ, Zhao, H, Krystal, JH, Justice, AC & Xu, K 2018, 'Machine learning selected smoking-associated DNA methylation signatures that predict HIV prognosis and mortality', Clinical Epigenetics, vol. 10, no. 1, 155. https://doi.org/10.1186/s13148-018-0591-z
Zhang, Xinyu ; Hu, Ying ; Aouizerat, Bradley ; Peng, Gang ; Marconi, Vincent C. ; Corley, Michael J. ; Hulgan, Todd ; Bryant, Kendall J. ; Zhao, Hongyu ; Krystal, John H. ; Justice, Amy C. ; Xu, Ke. / Machine learning selected smoking-associated DNA methylation signatures that predict HIV prognosis and mortality. In: Clinical Epigenetics. 2018 ; Vol. 10, No. 1.
@article{078041c8de2741e9a22ce41484959ae9,
title = "Machine learning selected smoking-associated DNA methylation signatures that predict HIV prognosis and mortality",
abstract = "Background: The effects of tobacco smoking on epigenome-wide methylation signatures in white blood cells (WBCs) collected from persons living with HIV may have important implications for their immune-related outcomes, including frailty and mortality. The application of a machine learning approach to the analysis of CpG methylation in the epigenome enables the selection of phenotypically relevant features from high-dimensional data. Using this approach, we now report that a set of smoking-associated DNA-methylated CpGs predicts HIV prognosis and mortality in an HIV-positive veteran population. Results: We first identified 137 epigenome-wide significant CpGs for smoking in WBCs from 1137 HIV-positive individuals (p < 1.70E-07). To examine whether smoking-associated CpGs were predictive of HIV frailty and mortality, we applied ensemble-based machine learning to build a model in a training sample employing 408,583 CpGs. A set of 698 CpGs was selected and predictive of high HIV frailty in a testing sample [(area under curve (AUC) = 0.73, 95{\%}CI 0.63~0.83)] and was replicated in an independent sample [(AUC = 0.78, 95{\%}CI 0.73~0.83)]. We further found an association of a DNA methylation index constructed from the 698 CpGs that were associated with a 5-year survival rate [HR = 1.46; 95{\%}CI 1.06~2.02, p = 0.02]. Interestingly, the 698 CpGs located on 445 genes were enriched on the integrin signaling pathway (p = 9.55E-05, false discovery rate = 0.036), which is responsible for the regulation of the cell cycle, differentiation, and adhesion. Conclusion: We demonstrated that smoking-associated DNA methylation features in white blood cells predict HIV infection-related clinical outcomes in a population living with HIV.",
keywords = "DNA methylation, Ensemble machine learning, HIV frailty, Mortality, Tobacco smoking",
author = "Xinyu Zhang and Ying Hu and Bradley Aouizerat and Gang Peng and Marconi, {Vincent C.} and Corley, {Michael J.} and Todd Hulgan and Bryant, {Kendall J.} and Hongyu Zhao and Krystal, {John H.} and Justice, {Amy C.} and Ke Xu",
year = "2018",
month = "12",
day = "13",
doi = "10.1186/s13148-018-0591-z",
language = "English (US)",
volume = "10",
journal = "Clinical Epigenetics",
issn = "1868-7075",
publisher = "Springer Verlag",
number = "1",

}

TY - JOUR

T1 - Machine learning selected smoking-associated DNA methylation signatures that predict HIV prognosis and mortality

AU - Zhang, Xinyu

AU - Hu, Ying

AU - Aouizerat, Bradley

AU - Peng, Gang

AU - Marconi, Vincent C.

AU - Corley, Michael J.

AU - Hulgan, Todd

AU - Bryant, Kendall J.

AU - Zhao, Hongyu

AU - Krystal, John H.

AU - Justice, Amy C.

AU - Xu, Ke

PY - 2018/12/13

Y1 - 2018/12/13

N2 - Background: The effects of tobacco smoking on epigenome-wide methylation signatures in white blood cells (WBCs) collected from persons living with HIV may have important implications for their immune-related outcomes, including frailty and mortality. The application of a machine learning approach to the analysis of CpG methylation in the epigenome enables the selection of phenotypically relevant features from high-dimensional data. Using this approach, we now report that a set of smoking-associated DNA-methylated CpGs predicts HIV prognosis and mortality in an HIV-positive veteran population. Results: We first identified 137 epigenome-wide significant CpGs for smoking in WBCs from 1137 HIV-positive individuals (p < 1.70E-07). To examine whether smoking-associated CpGs were predictive of HIV frailty and mortality, we applied ensemble-based machine learning to build a model in a training sample employing 408,583 CpGs. A set of 698 CpGs was selected and predictive of high HIV frailty in a testing sample [(area under curve (AUC) = 0.73, 95%CI 0.63~0.83)] and was replicated in an independent sample [(AUC = 0.78, 95%CI 0.73~0.83)]. We further found an association of a DNA methylation index constructed from the 698 CpGs that were associated with a 5-year survival rate [HR = 1.46; 95%CI 1.06~2.02, p = 0.02]. Interestingly, the 698 CpGs located on 445 genes were enriched on the integrin signaling pathway (p = 9.55E-05, false discovery rate = 0.036), which is responsible for the regulation of the cell cycle, differentiation, and adhesion. Conclusion: We demonstrated that smoking-associated DNA methylation features in white blood cells predict HIV infection-related clinical outcomes in a population living with HIV.

AB - Background: The effects of tobacco smoking on epigenome-wide methylation signatures in white blood cells (WBCs) collected from persons living with HIV may have important implications for their immune-related outcomes, including frailty and mortality. The application of a machine learning approach to the analysis of CpG methylation in the epigenome enables the selection of phenotypically relevant features from high-dimensional data. Using this approach, we now report that a set of smoking-associated DNA-methylated CpGs predicts HIV prognosis and mortality in an HIV-positive veteran population. Results: We first identified 137 epigenome-wide significant CpGs for smoking in WBCs from 1137 HIV-positive individuals (p < 1.70E-07). To examine whether smoking-associated CpGs were predictive of HIV frailty and mortality, we applied ensemble-based machine learning to build a model in a training sample employing 408,583 CpGs. A set of 698 CpGs was selected and predictive of high HIV frailty in a testing sample [(area under curve (AUC) = 0.73, 95%CI 0.63~0.83)] and was replicated in an independent sample [(AUC = 0.78, 95%CI 0.73~0.83)]. We further found an association of a DNA methylation index constructed from the 698 CpGs that were associated with a 5-year survival rate [HR = 1.46; 95%CI 1.06~2.02, p = 0.02]. Interestingly, the 698 CpGs located on 445 genes were enriched on the integrin signaling pathway (p = 9.55E-05, false discovery rate = 0.036), which is responsible for the regulation of the cell cycle, differentiation, and adhesion. Conclusion: We demonstrated that smoking-associated DNA methylation features in white blood cells predict HIV infection-related clinical outcomes in a population living with HIV.

KW - DNA methylation

KW - Ensemble machine learning

KW - HIV frailty

KW - Mortality

KW - Tobacco smoking

UR - http://www.scopus.com/inward/record.url?scp=85058594547&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85058594547&partnerID=8YFLogxK

U2 - 10.1186/s13148-018-0591-z

DO - 10.1186/s13148-018-0591-z

M3 - Article

C2 - 30545403

AN - SCOPUS:85058594547

VL - 10

JO - Clinical Epigenetics

JF - Clinical Epigenetics

SN - 1868-7075

IS - 1

M1 - 155

ER -