Computational Protein Design with Deep Learning Neural Networks

Jingxue Wang, Huali Cao, John Zhang, Yifei Qi

Research output: Contribution to journalArticle

Abstract

Computational protein design has a wide variety of applications. Despite its remarkable success, designing a protein for a given structure and function is still a challenging task. On the other hand, the number of solved protein structures is rapidly increasing while the number of unique protein folds has reached a steady number, suggesting more structural information is being accumulated on each fold. Deep learning neural network is a powerful method to learn such big data set and has shown superior performance in many machine learning fields. In this study, we applied the deep learning neural network approach to computational protein design for predicting the probability of 20 natural amino acids on each residue in a protein. A large set of protein structures was collected and a multi-layer neural network was constructed. A number of structural properties were extracted as input features and the best network achieved an accuracy of 38.3%. Using the network output as residue type restraints improves the average sequence identity in designing three natural proteins using Rosetta. Moreover, the predictions from our network show ~3% higher sequence identity than a previous method. Results from this study may benefit further development of computational protein design methods.

Original languageEnglish (US)
Article number6349
JournalScientific Reports
Volume8
Issue number1
DOIs
StatePublished - Dec 1 2018

Fingerprint

Neural networks
Proteins
Deep learning
Multilayer neural networks
Learning systems
Amino acids
Structural properties

ASJC Scopus subject areas

  • General

Cite this

Computational Protein Design with Deep Learning Neural Networks. / Wang, Jingxue; Cao, Huali; Zhang, John; Qi, Yifei.

In: Scientific Reports, Vol. 8, No. 1, 6349, 01.12.2018.

Research output: Contribution to journalArticle

Wang, Jingxue ; Cao, Huali ; Zhang, John ; Qi, Yifei. / Computational Protein Design with Deep Learning Neural Networks. In: Scientific Reports. 2018 ; Vol. 8, No. 1.
@article{21df8a4de2844f8bb242a87359d36a6c,
title = "Computational Protein Design with Deep Learning Neural Networks",
abstract = "Computational protein design has a wide variety of applications. Despite its remarkable success, designing a protein for a given structure and function is still a challenging task. On the other hand, the number of solved protein structures is rapidly increasing while the number of unique protein folds has reached a steady number, suggesting more structural information is being accumulated on each fold. Deep learning neural network is a powerful method to learn such big data set and has shown superior performance in many machine learning fields. In this study, we applied the deep learning neural network approach to computational protein design for predicting the probability of 20 natural amino acids on each residue in a protein. A large set of protein structures was collected and a multi-layer neural network was constructed. A number of structural properties were extracted as input features and the best network achieved an accuracy of 38.3{\%}. Using the network output as residue type restraints improves the average sequence identity in designing three natural proteins using Rosetta. Moreover, the predictions from our network show ~3{\%} higher sequence identity than a previous method. Results from this study may benefit further development of computational protein design methods.",
author = "Jingxue Wang and Huali Cao and John Zhang and Yifei Qi",
year = "2018",
month = "12",
day = "1",
doi = "10.1038/s41598-018-24760-x",
language = "English (US)",
volume = "8",
journal = "Scientific Reports",
issn = "2045-2322",
publisher = "Nature Publishing Group",
number = "1",

}

TY - JOUR

T1 - Computational Protein Design with Deep Learning Neural Networks

AU - Wang, Jingxue

AU - Cao, Huali

AU - Zhang, John

AU - Qi, Yifei

PY - 2018/12/1

Y1 - 2018/12/1

N2 - Computational protein design has a wide variety of applications. Despite its remarkable success, designing a protein for a given structure and function is still a challenging task. On the other hand, the number of solved protein structures is rapidly increasing while the number of unique protein folds has reached a steady number, suggesting more structural information is being accumulated on each fold. Deep learning neural network is a powerful method to learn such big data set and has shown superior performance in many machine learning fields. In this study, we applied the deep learning neural network approach to computational protein design for predicting the probability of 20 natural amino acids on each residue in a protein. A large set of protein structures was collected and a multi-layer neural network was constructed. A number of structural properties were extracted as input features and the best network achieved an accuracy of 38.3%. Using the network output as residue type restraints improves the average sequence identity in designing three natural proteins using Rosetta. Moreover, the predictions from our network show ~3% higher sequence identity than a previous method. Results from this study may benefit further development of computational protein design methods.

AB - Computational protein design has a wide variety of applications. Despite its remarkable success, designing a protein for a given structure and function is still a challenging task. On the other hand, the number of solved protein structures is rapidly increasing while the number of unique protein folds has reached a steady number, suggesting more structural information is being accumulated on each fold. Deep learning neural network is a powerful method to learn such big data set and has shown superior performance in many machine learning fields. In this study, we applied the deep learning neural network approach to computational protein design for predicting the probability of 20 natural amino acids on each residue in a protein. A large set of protein structures was collected and a multi-layer neural network was constructed. A number of structural properties were extracted as input features and the best network achieved an accuracy of 38.3%. Using the network output as residue type restraints improves the average sequence identity in designing three natural proteins using Rosetta. Moreover, the predictions from our network show ~3% higher sequence identity than a previous method. Results from this study may benefit further development of computational protein design methods.

UR - http://www.scopus.com/inward/record.url?scp=85045916803&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85045916803&partnerID=8YFLogxK

U2 - 10.1038/s41598-018-24760-x

DO - 10.1038/s41598-018-24760-x

M3 - Article

C2 - 29679026

AN - SCOPUS:85045916803

VL - 8

JO - Scientific Reports

JF - Scientific Reports

SN - 2045-2322

IS - 1

M1 - 6349

ER -