The Proteome Folding Project: Proteome-scale prediction of structure and function

Kevin Drew, Patrick Winters, Glenn L. Butterfoss, Viktors Berstis, Keith Uplinger, Jonathan Armstrong, Michael Riffle, Erik Schweighofer, Bill Bovermann, David R. Goodlett, Trisha N. Davis, Dennis Shasha, Lars Malmström, Richard Bonneau

Research output: Contribution to journalArticle

Abstract

The incompleteness of proteome structure and function annotation is a critical problem for biologists and, in particular, severely limits interpretation of high-throughput and next-generation experiments. We have developed a proteome annotation pipeline based on structure prediction, where function and structure annotations are generated using an integration of sequence comparison, fold recognition, and grid-computing-enabled de novo structure prediction. We predict protein domain boundaries and three-dimensional (3D) structures for protein domains from 94 genomes (including human, Arabidopsis, rice, mouse, fly, yeast, Escherichia coli, and worm). De novo structure predictions were distributed on a grid of more than 1.5 million CPUs worldwide (World Community Grid). We generated significant numbers of new confident fold annotations (9% of domains that are otherwise unannotated in these genomes). We demonstrate that predicted structures can be combined with annotations from the Gene Ontology database to predict new and more specific molecular functions.

Original languageEnglish (US)
Pages (from-to)1981-1994
Number of pages14
JournalGenome Research
Volume21
Issue number11
DOIs
StatePublished - Nov 2011

Fingerprint

Proteome
Gene Ontology
Human Genome
Arabidopsis
Diptera
Yeasts
Genome
Databases
Escherichia coli
Protein Domains
Oryza

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)

Cite this

The Proteome Folding Project : Proteome-scale prediction of structure and function. / Drew, Kevin; Winters, Patrick; Butterfoss, Glenn L.; Berstis, Viktors; Uplinger, Keith; Armstrong, Jonathan; Riffle, Michael; Schweighofer, Erik; Bovermann, Bill; Goodlett, David R.; Davis, Trisha N.; Shasha, Dennis; Malmström, Lars; Bonneau, Richard.

In: Genome Research, Vol. 21, No. 11, 11.2011, p. 1981-1994.

Research output: Contribution to journalArticle

Drew, K, Winters, P, Butterfoss, GL, Berstis, V, Uplinger, K, Armstrong, J, Riffle, M, Schweighofer, E, Bovermann, B, Goodlett, DR, Davis, TN, Shasha, D, Malmström, L & Bonneau, R 2011, 'The Proteome Folding Project: Proteome-scale prediction of structure and function', Genome Research, vol. 21, no. 11, pp. 1981-1994. https://doi.org/10.1101/gr.121475.111
Drew, Kevin ; Winters, Patrick ; Butterfoss, Glenn L. ; Berstis, Viktors ; Uplinger, Keith ; Armstrong, Jonathan ; Riffle, Michael ; Schweighofer, Erik ; Bovermann, Bill ; Goodlett, David R. ; Davis, Trisha N. ; Shasha, Dennis ; Malmström, Lars ; Bonneau, Richard. / The Proteome Folding Project : Proteome-scale prediction of structure and function. In: Genome Research. 2011 ; Vol. 21, No. 11. pp. 1981-1994.
@article{fa81de32dace47f69d83b11031bd6e99,
title = "The Proteome Folding Project: Proteome-scale prediction of structure and function",
abstract = "The incompleteness of proteome structure and function annotation is a critical problem for biologists and, in particular, severely limits interpretation of high-throughput and next-generation experiments. We have developed a proteome annotation pipeline based on structure prediction, where function and structure annotations are generated using an integration of sequence comparison, fold recognition, and grid-computing-enabled de novo structure prediction. We predict protein domain boundaries and three-dimensional (3D) structures for protein domains from 94 genomes (including human, Arabidopsis, rice, mouse, fly, yeast, Escherichia coli, and worm). De novo structure predictions were distributed on a grid of more than 1.5 million CPUs worldwide (World Community Grid). We generated significant numbers of new confident fold annotations (9{\%} of domains that are otherwise unannotated in these genomes). We demonstrate that predicted structures can be combined with annotations from the Gene Ontology database to predict new and more specific molecular functions.",
author = "Kevin Drew and Patrick Winters and Butterfoss, {Glenn L.} and Viktors Berstis and Keith Uplinger and Jonathan Armstrong and Michael Riffle and Erik Schweighofer and Bill Bovermann and Goodlett, {David R.} and Davis, {Trisha N.} and Dennis Shasha and Lars Malmstr{\"o}m and Richard Bonneau",
year = "2011",
month = "11",
doi = "10.1101/gr.121475.111",
language = "English (US)",
volume = "21",
pages = "1981--1994",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "11",

}

TY - JOUR

T1 - The Proteome Folding Project

T2 - Proteome-scale prediction of structure and function

AU - Drew, Kevin

AU - Winters, Patrick

AU - Butterfoss, Glenn L.

AU - Berstis, Viktors

AU - Uplinger, Keith

AU - Armstrong, Jonathan

AU - Riffle, Michael

AU - Schweighofer, Erik

AU - Bovermann, Bill

AU - Goodlett, David R.

AU - Davis, Trisha N.

AU - Shasha, Dennis

AU - Malmström, Lars

AU - Bonneau, Richard

PY - 2011/11

Y1 - 2011/11

N2 - The incompleteness of proteome structure and function annotation is a critical problem for biologists and, in particular, severely limits interpretation of high-throughput and next-generation experiments. We have developed a proteome annotation pipeline based on structure prediction, where function and structure annotations are generated using an integration of sequence comparison, fold recognition, and grid-computing-enabled de novo structure prediction. We predict protein domain boundaries and three-dimensional (3D) structures for protein domains from 94 genomes (including human, Arabidopsis, rice, mouse, fly, yeast, Escherichia coli, and worm). De novo structure predictions were distributed on a grid of more than 1.5 million CPUs worldwide (World Community Grid). We generated significant numbers of new confident fold annotations (9% of domains that are otherwise unannotated in these genomes). We demonstrate that predicted structures can be combined with annotations from the Gene Ontology database to predict new and more specific molecular functions.

AB - The incompleteness of proteome structure and function annotation is a critical problem for biologists and, in particular, severely limits interpretation of high-throughput and next-generation experiments. We have developed a proteome annotation pipeline based on structure prediction, where function and structure annotations are generated using an integration of sequence comparison, fold recognition, and grid-computing-enabled de novo structure prediction. We predict protein domain boundaries and three-dimensional (3D) structures for protein domains from 94 genomes (including human, Arabidopsis, rice, mouse, fly, yeast, Escherichia coli, and worm). De novo structure predictions were distributed on a grid of more than 1.5 million CPUs worldwide (World Community Grid). We generated significant numbers of new confident fold annotations (9% of domains that are otherwise unannotated in these genomes). We demonstrate that predicted structures can be combined with annotations from the Gene Ontology database to predict new and more specific molecular functions.

UR - http://www.scopus.com/inward/record.url?scp=80555142938&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80555142938&partnerID=8YFLogxK

U2 - 10.1101/gr.121475.111

DO - 10.1101/gr.121475.111

M3 - Article

C2 - 21824995

AN - SCOPUS:80555142938

VL - 21

SP - 1981

EP - 1994

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 11

ER -