SUPERFAMILY - Sophisticated comparative genomics, data mining, visualization and phylogeny

Derek Wilson, Ralph Pethica, Yiduo Zhou, Charles Talbot, Christine Vogel, Martin Madera, Cyrus Chothia, Julian Gough

Research output: Contribution to journalArticle

Abstract

SUPERFAMILY provides structural, functional and evolutionary information for proteins from all completely sequenced genomes, and large sequence collections such as UniProt. Protein domain assignments for over 900 genomes are included in the database, which can be accessed at http://supfam.org/. Hidden Markov models based on Structural Classification of Proteins (SCOP) domain definitions at the superfamily level are used to provide structural annotation. We recently produced a new model library based on SCOP 1.73. Family level assignments are also available. From the web site users can submit sequences for SCOP domain classification; search for keywords such as superfamilies, families, organism names, models and sequence identifiers; find over- and underrepresented families or superfamilies within a genome relative to other genomes or groups of genomes; compare domain architectures across selections of genomes and finally build multiple sequence alignments between Protein Data Bank (PDB), genomic and custom sequences. Recent extensions to the database include InterPro abstracts and Gene Ontology terms for superfamiles, taxonomic visualization of the distribution of families across the tree of life, searches for functionally similar domain architectures and phylogenetic trees. The database, models and associated scripts are available for download from the ftp site.

Original languageEnglish (US)
JournalNucleic Acids Research
Volume37
Issue numberSUPPL. 1
DOIs
StatePublished - 2009

Fingerprint

Data Mining
Phylogeny
Genomics
Genome
Databases
Gene Ontology
Proteins
Sequence Alignment
Pedigree
Libraries
Names
Protein Domains

ASJC Scopus subject areas

  • Genetics

Cite this

SUPERFAMILY - Sophisticated comparative genomics, data mining, visualization and phylogeny. / Wilson, Derek; Pethica, Ralph; Zhou, Yiduo; Talbot, Charles; Vogel, Christine; Madera, Martin; Chothia, Cyrus; Gough, Julian.

In: Nucleic Acids Research, Vol. 37, No. SUPPL. 1, 2009.

Research output: Contribution to journalArticle

Wilson, D, Pethica, R, Zhou, Y, Talbot, C, Vogel, C, Madera, M, Chothia, C & Gough, J 2009, 'SUPERFAMILY - Sophisticated comparative genomics, data mining, visualization and phylogeny', Nucleic Acids Research, vol. 37, no. SUPPL. 1. https://doi.org/10.1093/nar/gkn762
Wilson, Derek ; Pethica, Ralph ; Zhou, Yiduo ; Talbot, Charles ; Vogel, Christine ; Madera, Martin ; Chothia, Cyrus ; Gough, Julian. / SUPERFAMILY - Sophisticated comparative genomics, data mining, visualization and phylogeny. In: Nucleic Acids Research. 2009 ; Vol. 37, No. SUPPL. 1.
@article{23681b7e2a5f48fe850eaeb6d3082b49,
title = "SUPERFAMILY - Sophisticated comparative genomics, data mining, visualization and phylogeny",
abstract = "SUPERFAMILY provides structural, functional and evolutionary information for proteins from all completely sequenced genomes, and large sequence collections such as UniProt. Protein domain assignments for over 900 genomes are included in the database, which can be accessed at http://supfam.org/. Hidden Markov models based on Structural Classification of Proteins (SCOP) domain definitions at the superfamily level are used to provide structural annotation. We recently produced a new model library based on SCOP 1.73. Family level assignments are also available. From the web site users can submit sequences for SCOP domain classification; search for keywords such as superfamilies, families, organism names, models and sequence identifiers; find over- and underrepresented families or superfamilies within a genome relative to other genomes or groups of genomes; compare domain architectures across selections of genomes and finally build multiple sequence alignments between Protein Data Bank (PDB), genomic and custom sequences. Recent extensions to the database include InterPro abstracts and Gene Ontology terms for superfamiles, taxonomic visualization of the distribution of families across the tree of life, searches for functionally similar domain architectures and phylogenetic trees. The database, models and associated scripts are available for download from the ftp site.",
author = "Derek Wilson and Ralph Pethica and Yiduo Zhou and Charles Talbot and Christine Vogel and Martin Madera and Cyrus Chothia and Julian Gough",
year = "2009",
doi = "10.1093/nar/gkn762",
language = "English (US)",
volume = "37",
journal = "Nucleic Acids Research",
issn = "0305-1048",
publisher = "Oxford University Press",
number = "SUPPL. 1",

}

TY - JOUR

T1 - SUPERFAMILY - Sophisticated comparative genomics, data mining, visualization and phylogeny

AU - Wilson, Derek

AU - Pethica, Ralph

AU - Zhou, Yiduo

AU - Talbot, Charles

AU - Vogel, Christine

AU - Madera, Martin

AU - Chothia, Cyrus

AU - Gough, Julian

PY - 2009

Y1 - 2009

N2 - SUPERFAMILY provides structural, functional and evolutionary information for proteins from all completely sequenced genomes, and large sequence collections such as UniProt. Protein domain assignments for over 900 genomes are included in the database, which can be accessed at http://supfam.org/. Hidden Markov models based on Structural Classification of Proteins (SCOP) domain definitions at the superfamily level are used to provide structural annotation. We recently produced a new model library based on SCOP 1.73. Family level assignments are also available. From the web site users can submit sequences for SCOP domain classification; search for keywords such as superfamilies, families, organism names, models and sequence identifiers; find over- and underrepresented families or superfamilies within a genome relative to other genomes or groups of genomes; compare domain architectures across selections of genomes and finally build multiple sequence alignments between Protein Data Bank (PDB), genomic and custom sequences. Recent extensions to the database include InterPro abstracts and Gene Ontology terms for superfamiles, taxonomic visualization of the distribution of families across the tree of life, searches for functionally similar domain architectures and phylogenetic trees. The database, models and associated scripts are available for download from the ftp site.

AB - SUPERFAMILY provides structural, functional and evolutionary information for proteins from all completely sequenced genomes, and large sequence collections such as UniProt. Protein domain assignments for over 900 genomes are included in the database, which can be accessed at http://supfam.org/. Hidden Markov models based on Structural Classification of Proteins (SCOP) domain definitions at the superfamily level are used to provide structural annotation. We recently produced a new model library based on SCOP 1.73. Family level assignments are also available. From the web site users can submit sequences for SCOP domain classification; search for keywords such as superfamilies, families, organism names, models and sequence identifiers; find over- and underrepresented families or superfamilies within a genome relative to other genomes or groups of genomes; compare domain architectures across selections of genomes and finally build multiple sequence alignments between Protein Data Bank (PDB), genomic and custom sequences. Recent extensions to the database include InterPro abstracts and Gene Ontology terms for superfamiles, taxonomic visualization of the distribution of families across the tree of life, searches for functionally similar domain architectures and phylogenetic trees. The database, models and associated scripts are available for download from the ftp site.

UR - http://www.scopus.com/inward/record.url?scp=58149203228&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=58149203228&partnerID=8YFLogxK

U2 - 10.1093/nar/gkn762

DO - 10.1093/nar/gkn762

M3 - Article

VL - 37

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 0305-1048

IS - SUPPL. 1

ER -