De novo prediction of three-dimensional structures for major protein families

Richard Bonneau, Charlie E M Strauss, Carol A. Rohl, Dylan Chivian, Phillip Bradley, Lars Malmström, Tim Robertson, David Baker

Research output: Contribution to journalArticle

Abstract

We use the Rosetta de novo structure prediction method to produce three-dimensional structure models for all Pfam-A sequence families with average length under 150 residues and no link to any protein of known structure. To estimate the reliability of the predictions, the method was calibrated on 131 proteins of known structure. For approximately 60% of the proteins one of the top five models was correctly predicted for 50 or more residues, and for approximately 35%, the correct SCOP superfamily was identified in a structure-based search of the Protein Data Bank using one of the models. This performance is consistent with results from the fourth critical assessment of structure prediction (CASP4). Correct and incorrect predictions could be partially distinguished using a confidence function based on a combination of simulation convergence, protein length and the similarity of a given structure prediction to known protein structures. While the limited accuracy and reliability of the method precludes definitive conclusions, the Pfam models provide the only tertiary structure information available for the 12% of publicly available sequences represented by these large protein families.

Original languageEnglish (US)
Pages (from-to)65-78
Number of pages14
JournalJournal of Molecular Biology
Volume322
Issue number1
DOIs
StatePublished - 2002

Fingerprint

Proteins
Databases

Keywords

  • Gene annotation
  • Pfam
  • Rosetta
  • Structural genomics
  • Structure prediction

ASJC Scopus subject areas

  • Virology

Cite this

Bonneau, R., Strauss, C. E. M., Rohl, C. A., Chivian, D., Bradley, P., Malmström, L., ... Baker, D. (2002). De novo prediction of three-dimensional structures for major protein families. Journal of Molecular Biology, 322(1), 65-78. https://doi.org/10.1016/S0022-2836(02)00698-8

De novo prediction of three-dimensional structures for major protein families. / Bonneau, Richard; Strauss, Charlie E M; Rohl, Carol A.; Chivian, Dylan; Bradley, Phillip; Malmström, Lars; Robertson, Tim; Baker, David.

In: Journal of Molecular Biology, Vol. 322, No. 1, 2002, p. 65-78.

Research output: Contribution to journalArticle

Bonneau, R, Strauss, CEM, Rohl, CA, Chivian, D, Bradley, P, Malmström, L, Robertson, T & Baker, D 2002, 'De novo prediction of three-dimensional structures for major protein families', Journal of Molecular Biology, vol. 322, no. 1, pp. 65-78. https://doi.org/10.1016/S0022-2836(02)00698-8
Bonneau, Richard ; Strauss, Charlie E M ; Rohl, Carol A. ; Chivian, Dylan ; Bradley, Phillip ; Malmström, Lars ; Robertson, Tim ; Baker, David. / De novo prediction of three-dimensional structures for major protein families. In: Journal of Molecular Biology. 2002 ; Vol. 322, No. 1. pp. 65-78.
@article{c9e6bbdbbf4b4f3e8b1845de364c62e3,
title = "De novo prediction of three-dimensional structures for major protein families",
abstract = "We use the Rosetta de novo structure prediction method to produce three-dimensional structure models for all Pfam-A sequence families with average length under 150 residues and no link to any protein of known structure. To estimate the reliability of the predictions, the method was calibrated on 131 proteins of known structure. For approximately 60{\%} of the proteins one of the top five models was correctly predicted for 50 or more residues, and for approximately 35{\%}, the correct SCOP superfamily was identified in a structure-based search of the Protein Data Bank using one of the models. This performance is consistent with results from the fourth critical assessment of structure prediction (CASP4). Correct and incorrect predictions could be partially distinguished using a confidence function based on a combination of simulation convergence, protein length and the similarity of a given structure prediction to known protein structures. While the limited accuracy and reliability of the method precludes definitive conclusions, the Pfam models provide the only tertiary structure information available for the 12{\%} of publicly available sequences represented by these large protein families.",
keywords = "Gene annotation, Pfam, Rosetta, Structural genomics, Structure prediction",
author = "Richard Bonneau and Strauss, {Charlie E M} and Rohl, {Carol A.} and Dylan Chivian and Phillip Bradley and Lars Malmstr{\"o}m and Tim Robertson and David Baker",
year = "2002",
doi = "10.1016/S0022-2836(02)00698-8",
language = "English (US)",
volume = "322",
pages = "65--78",
journal = "Journal of Molecular Biology",
issn = "0022-2836",
publisher = "Academic Press Inc.",
number = "1",

}

TY - JOUR

T1 - De novo prediction of three-dimensional structures for major protein families

AU - Bonneau, Richard

AU - Strauss, Charlie E M

AU - Rohl, Carol A.

AU - Chivian, Dylan

AU - Bradley, Phillip

AU - Malmström, Lars

AU - Robertson, Tim

AU - Baker, David

PY - 2002

Y1 - 2002

N2 - We use the Rosetta de novo structure prediction method to produce three-dimensional structure models for all Pfam-A sequence families with average length under 150 residues and no link to any protein of known structure. To estimate the reliability of the predictions, the method was calibrated on 131 proteins of known structure. For approximately 60% of the proteins one of the top five models was correctly predicted for 50 or more residues, and for approximately 35%, the correct SCOP superfamily was identified in a structure-based search of the Protein Data Bank using one of the models. This performance is consistent with results from the fourth critical assessment of structure prediction (CASP4). Correct and incorrect predictions could be partially distinguished using a confidence function based on a combination of simulation convergence, protein length and the similarity of a given structure prediction to known protein structures. While the limited accuracy and reliability of the method precludes definitive conclusions, the Pfam models provide the only tertiary structure information available for the 12% of publicly available sequences represented by these large protein families.

AB - We use the Rosetta de novo structure prediction method to produce three-dimensional structure models for all Pfam-A sequence families with average length under 150 residues and no link to any protein of known structure. To estimate the reliability of the predictions, the method was calibrated on 131 proteins of known structure. For approximately 60% of the proteins one of the top five models was correctly predicted for 50 or more residues, and for approximately 35%, the correct SCOP superfamily was identified in a structure-based search of the Protein Data Bank using one of the models. This performance is consistent with results from the fourth critical assessment of structure prediction (CASP4). Correct and incorrect predictions could be partially distinguished using a confidence function based on a combination of simulation convergence, protein length and the similarity of a given structure prediction to known protein structures. While the limited accuracy and reliability of the method precludes definitive conclusions, the Pfam models provide the only tertiary structure information available for the 12% of publicly available sequences represented by these large protein families.

KW - Gene annotation

KW - Pfam

KW - Rosetta

KW - Structural genomics

KW - Structure prediction

UR - http://www.scopus.com/inward/record.url?scp=0036968925&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036968925&partnerID=8YFLogxK

U2 - 10.1016/S0022-2836(02)00698-8

DO - 10.1016/S0022-2836(02)00698-8

M3 - Article

C2 - 12215415

AN - SCOPUS:0036968925

VL - 322

SP - 65

EP - 78

JO - Journal of Molecular Biology

JF - Journal of Molecular Biology

SN - 0022-2836

IS - 1

ER -