The SUPERFAMILY database in 2004

Additions and improvements

Martin Madera, Christine Vogel, Sarah K. Kummerfeld, Cyrus Chothia, Julian Gough

Research output: Contribution to journalArticle

Abstract

The SUPERFAMILY database provides structural assignments to protein sequences and a framework for analysis of the results. At the core of the database is a library of profile Hidden Markov Models that represent all proteins of known structure. The library is based on the SCOP classification of proteins: each model corresponds to a SCOP domain and aims to represent an entire superfamily. We have applied the library to predicted proteins from all completely sequenced genomes (currently 154), the Swiss-Prot and TrEMBL databases and other sequence collections. Close to 60% of all proteins have at least one match, and one half of all residues are covered by assignments. All models and full results are available for download and online browsing at http://supfam.org. Users can study the distribution of their superfamily of interest across all completely sequenced genomes, investigate with which other superfamilies it combines and retrieve proteins in which it occurs. Alternatively, concentrating on a particular genome as a whole, it is possible first, to find out its superfamily composition, and secondly, to compare it with that of other genomes to detect superfamilies that are over- or under-represented. In addition, the webserver provides the following standard services: sequence search; keyword search for genomes, superfamilies and sequence identifiers; and multiple alignment of genomic, PDB and custom sequences.

Original languageEnglish (US)
JournalNucleic Acids Research
Volume32
Issue numberDATABASE ISS.
StatePublished - Jan 1 2004

Fingerprint

Databases
Genome
Libraries
Proteins

ASJC Scopus subject areas

  • Genetics

Cite this

Madera, M., Vogel, C., Kummerfeld, S. K., Chothia, C., & Gough, J. (2004). The SUPERFAMILY database in 2004: Additions and improvements. Nucleic Acids Research, 32(DATABASE ISS.).

The SUPERFAMILY database in 2004 : Additions and improvements. / Madera, Martin; Vogel, Christine; Kummerfeld, Sarah K.; Chothia, Cyrus; Gough, Julian.

In: Nucleic Acids Research, Vol. 32, No. DATABASE ISS., 01.01.2004.

Research output: Contribution to journalArticle

Madera, M, Vogel, C, Kummerfeld, SK, Chothia, C & Gough, J 2004, 'The SUPERFAMILY database in 2004: Additions and improvements', Nucleic Acids Research, vol. 32, no. DATABASE ISS..
Madera M, Vogel C, Kummerfeld SK, Chothia C, Gough J. The SUPERFAMILY database in 2004: Additions and improvements. Nucleic Acids Research. 2004 Jan 1;32(DATABASE ISS.).
Madera, Martin ; Vogel, Christine ; Kummerfeld, Sarah K. ; Chothia, Cyrus ; Gough, Julian. / The SUPERFAMILY database in 2004 : Additions and improvements. In: Nucleic Acids Research. 2004 ; Vol. 32, No. DATABASE ISS.
@article{0e647d91159c41caa368c58187d0bd30,
title = "The SUPERFAMILY database in 2004: Additions and improvements",
abstract = "The SUPERFAMILY database provides structural assignments to protein sequences and a framework for analysis of the results. At the core of the database is a library of profile Hidden Markov Models that represent all proteins of known structure. The library is based on the SCOP classification of proteins: each model corresponds to a SCOP domain and aims to represent an entire superfamily. We have applied the library to predicted proteins from all completely sequenced genomes (currently 154), the Swiss-Prot and TrEMBL databases and other sequence collections. Close to 60{\%} of all proteins have at least one match, and one half of all residues are covered by assignments. All models and full results are available for download and online browsing at http://supfam.org. Users can study the distribution of their superfamily of interest across all completely sequenced genomes, investigate with which other superfamilies it combines and retrieve proteins in which it occurs. Alternatively, concentrating on a particular genome as a whole, it is possible first, to find out its superfamily composition, and secondly, to compare it with that of other genomes to detect superfamilies that are over- or under-represented. In addition, the webserver provides the following standard services: sequence search; keyword search for genomes, superfamilies and sequence identifiers; and multiple alignment of genomic, PDB and custom sequences.",
author = "Martin Madera and Christine Vogel and Kummerfeld, {Sarah K.} and Cyrus Chothia and Julian Gough",
year = "2004",
month = "1",
day = "1",
language = "English (US)",
volume = "32",
journal = "Nucleic Acids Research",
issn = "0305-1048",
publisher = "Oxford University Press",
number = "DATABASE ISS.",

}

TY - JOUR

T1 - The SUPERFAMILY database in 2004

T2 - Additions and improvements

AU - Madera, Martin

AU - Vogel, Christine

AU - Kummerfeld, Sarah K.

AU - Chothia, Cyrus

AU - Gough, Julian

PY - 2004/1/1

Y1 - 2004/1/1

N2 - The SUPERFAMILY database provides structural assignments to protein sequences and a framework for analysis of the results. At the core of the database is a library of profile Hidden Markov Models that represent all proteins of known structure. The library is based on the SCOP classification of proteins: each model corresponds to a SCOP domain and aims to represent an entire superfamily. We have applied the library to predicted proteins from all completely sequenced genomes (currently 154), the Swiss-Prot and TrEMBL databases and other sequence collections. Close to 60% of all proteins have at least one match, and one half of all residues are covered by assignments. All models and full results are available for download and online browsing at http://supfam.org. Users can study the distribution of their superfamily of interest across all completely sequenced genomes, investigate with which other superfamilies it combines and retrieve proteins in which it occurs. Alternatively, concentrating on a particular genome as a whole, it is possible first, to find out its superfamily composition, and secondly, to compare it with that of other genomes to detect superfamilies that are over- or under-represented. In addition, the webserver provides the following standard services: sequence search; keyword search for genomes, superfamilies and sequence identifiers; and multiple alignment of genomic, PDB and custom sequences.

AB - The SUPERFAMILY database provides structural assignments to protein sequences and a framework for analysis of the results. At the core of the database is a library of profile Hidden Markov Models that represent all proteins of known structure. The library is based on the SCOP classification of proteins: each model corresponds to a SCOP domain and aims to represent an entire superfamily. We have applied the library to predicted proteins from all completely sequenced genomes (currently 154), the Swiss-Prot and TrEMBL databases and other sequence collections. Close to 60% of all proteins have at least one match, and one half of all residues are covered by assignments. All models and full results are available for download and online browsing at http://supfam.org. Users can study the distribution of their superfamily of interest across all completely sequenced genomes, investigate with which other superfamilies it combines and retrieve proteins in which it occurs. Alternatively, concentrating on a particular genome as a whole, it is possible first, to find out its superfamily composition, and secondly, to compare it with that of other genomes to detect superfamilies that are over- or under-represented. In addition, the webserver provides the following standard services: sequence search; keyword search for genomes, superfamilies and sequence identifiers; and multiple alignment of genomic, PDB and custom sequences.

UR - http://www.scopus.com/inward/record.url?scp=0347125294&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0347125294&partnerID=8YFLogxK

M3 - Article

VL - 32

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 0305-1048

IS - DATABASE ISS.

ER -