Robust classification of protein variation using structural modelling and large-scale data integration

Evan H. Baugh, Riley Simmons-Edler, Christian L. Müller, Rebecca F. Alford, Natalia Volfovsky, Alex E. Lash, Richard Bonneau

Research output: Contribution to journalArticle

Abstract

Existing methods for interpreting protein variation focus on annotating mutation pathogenicity rather than detailed interpretation of variant deleteriousness and frequently use only sequence-based or structure-based information. We present VIPUR, a computational framework that seamlessly integrates sequence analysis and structural modelling (using the Rosetta protein modelling suite) to identify and interpret deleterious protein variants. To train VIPUR, we collected 9477 protein variants with known effects on protein function from multiple organisms and curated structural models for each variant from crystal structures and homology models. VIPUR can be applied to mutations in any organism's proteome with improved generalized accuracy (AUROC. 83) and interpretability (AUPR. 87) compared to other methods. We demonstrate that VIPUR's predictions of deleteriousness match the biological phenotypes in ClinVar and provide a clear ranking of prediction confidence. We use VIPUR to interpret known mutations associated with inflammation and diabetes, demonstrating the structural diversity of disrupted functional sites and improved interpretation of mutations associated with human diseases. Lastly, we demonstrate VIPUR's ability to highlight candidate variants associated with human diseases by applying VIPUR to de novo variants associated with autism spectrum disorders.

Original languageEnglish (US)
Pages (from-to)2501-2513
Number of pages13
JournalNucleic Acids Research
Volume44
Issue number6
DOIs
StatePublished - Feb 28 2016

Fingerprint

Mutation
Proteins
Aptitude
Structural Models
Proteome
Sequence Analysis
Virulence
Inflammation
Phenotype
Autism Spectrum Disorder

ASJC Scopus subject areas

  • Genetics

Cite this

Baugh, E. H., Simmons-Edler, R., Müller, C. L., Alford, R. F., Volfovsky, N., Lash, A. E., & Bonneau, R. (2016). Robust classification of protein variation using structural modelling and large-scale data integration. Nucleic Acids Research, 44(6), 2501-2513. https://doi.org/10.1093/nar/gkw120

Robust classification of protein variation using structural modelling and large-scale data integration. / Baugh, Evan H.; Simmons-Edler, Riley; Müller, Christian L.; Alford, Rebecca F.; Volfovsky, Natalia; Lash, Alex E.; Bonneau, Richard.

In: Nucleic Acids Research, Vol. 44, No. 6, 28.02.2016, p. 2501-2513.

Research output: Contribution to journalArticle

Baugh, EH, Simmons-Edler, R, Müller, CL, Alford, RF, Volfovsky, N, Lash, AE & Bonneau, R 2016, 'Robust classification of protein variation using structural modelling and large-scale data integration', Nucleic Acids Research, vol. 44, no. 6, pp. 2501-2513. https://doi.org/10.1093/nar/gkw120
Baugh EH, Simmons-Edler R, Müller CL, Alford RF, Volfovsky N, Lash AE et al. Robust classification of protein variation using structural modelling and large-scale data integration. Nucleic Acids Research. 2016 Feb 28;44(6):2501-2513. https://doi.org/10.1093/nar/gkw120
Baugh, Evan H. ; Simmons-Edler, Riley ; Müller, Christian L. ; Alford, Rebecca F. ; Volfovsky, Natalia ; Lash, Alex E. ; Bonneau, Richard. / Robust classification of protein variation using structural modelling and large-scale data integration. In: Nucleic Acids Research. 2016 ; Vol. 44, No. 6. pp. 2501-2513.
@article{021ed2005f824bf0af124be250028642,
title = "Robust classification of protein variation using structural modelling and large-scale data integration",
abstract = "Existing methods for interpreting protein variation focus on annotating mutation pathogenicity rather than detailed interpretation of variant deleteriousness and frequently use only sequence-based or structure-based information. We present VIPUR, a computational framework that seamlessly integrates sequence analysis and structural modelling (using the Rosetta protein modelling suite) to identify and interpret deleterious protein variants. To train VIPUR, we collected 9477 protein variants with known effects on protein function from multiple organisms and curated structural models for each variant from crystal structures and homology models. VIPUR can be applied to mutations in any organism's proteome with improved generalized accuracy (AUROC. 83) and interpretability (AUPR. 87) compared to other methods. We demonstrate that VIPUR's predictions of deleteriousness match the biological phenotypes in ClinVar and provide a clear ranking of prediction confidence. We use VIPUR to interpret known mutations associated with inflammation and diabetes, demonstrating the structural diversity of disrupted functional sites and improved interpretation of mutations associated with human diseases. Lastly, we demonstrate VIPUR's ability to highlight candidate variants associated with human diseases by applying VIPUR to de novo variants associated with autism spectrum disorders.",
author = "Baugh, {Evan H.} and Riley Simmons-Edler and M{\"u}ller, {Christian L.} and Alford, {Rebecca F.} and Natalia Volfovsky and Lash, {Alex E.} and Richard Bonneau",
year = "2016",
month = "2",
day = "28",
doi = "10.1093/nar/gkw120",
language = "English (US)",
volume = "44",
pages = "2501--2513",
journal = "Nucleic Acids Research",
issn = "0305-1048",
publisher = "Oxford University Press",
number = "6",

}

TY - JOUR

T1 - Robust classification of protein variation using structural modelling and large-scale data integration

AU - Baugh, Evan H.

AU - Simmons-Edler, Riley

AU - Müller, Christian L.

AU - Alford, Rebecca F.

AU - Volfovsky, Natalia

AU - Lash, Alex E.

AU - Bonneau, Richard

PY - 2016/2/28

Y1 - 2016/2/28

N2 - Existing methods for interpreting protein variation focus on annotating mutation pathogenicity rather than detailed interpretation of variant deleteriousness and frequently use only sequence-based or structure-based information. We present VIPUR, a computational framework that seamlessly integrates sequence analysis and structural modelling (using the Rosetta protein modelling suite) to identify and interpret deleterious protein variants. To train VIPUR, we collected 9477 protein variants with known effects on protein function from multiple organisms and curated structural models for each variant from crystal structures and homology models. VIPUR can be applied to mutations in any organism's proteome with improved generalized accuracy (AUROC. 83) and interpretability (AUPR. 87) compared to other methods. We demonstrate that VIPUR's predictions of deleteriousness match the biological phenotypes in ClinVar and provide a clear ranking of prediction confidence. We use VIPUR to interpret known mutations associated with inflammation and diabetes, demonstrating the structural diversity of disrupted functional sites and improved interpretation of mutations associated with human diseases. Lastly, we demonstrate VIPUR's ability to highlight candidate variants associated with human diseases by applying VIPUR to de novo variants associated with autism spectrum disorders.

AB - Existing methods for interpreting protein variation focus on annotating mutation pathogenicity rather than detailed interpretation of variant deleteriousness and frequently use only sequence-based or structure-based information. We present VIPUR, a computational framework that seamlessly integrates sequence analysis and structural modelling (using the Rosetta protein modelling suite) to identify and interpret deleterious protein variants. To train VIPUR, we collected 9477 protein variants with known effects on protein function from multiple organisms and curated structural models for each variant from crystal structures and homology models. VIPUR can be applied to mutations in any organism's proteome with improved generalized accuracy (AUROC. 83) and interpretability (AUPR. 87) compared to other methods. We demonstrate that VIPUR's predictions of deleteriousness match the biological phenotypes in ClinVar and provide a clear ranking of prediction confidence. We use VIPUR to interpret known mutations associated with inflammation and diabetes, demonstrating the structural diversity of disrupted functional sites and improved interpretation of mutations associated with human diseases. Lastly, we demonstrate VIPUR's ability to highlight candidate variants associated with human diseases by applying VIPUR to de novo variants associated with autism spectrum disorders.

UR - http://www.scopus.com/inward/record.url?scp=84963788750&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84963788750&partnerID=8YFLogxK

U2 - 10.1093/nar/gkw120

DO - 10.1093/nar/gkw120

M3 - Article

C2 - 26926108

AN - SCOPUS:84963788750

VL - 44

SP - 2501

EP - 2513

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 0305-1048

IS - 6

ER -