Protein family expansions and biological complexity

Christine Vogel, Cyrus Chothia

Research output: Contribution to journalArticle

Abstract

During the course of evolution, new proteins are produced very largely as the result of gene duplication, divergence and, in many cases, combination. This means that proteins or protein domains belong to families or, in cases where their relationships can only be recognised on the basis of structure, superfamilies whose members descended from a common ancestor. The size of superfamilies can vary greatly. Also, during the course of evolution organisms of increasing complexity have arisen. In this paper we determine the identity of those superfamilies whose relative sizes in different organisms are highly correlated to the complexity of the organisms. As a measure of the complexity of 38 uni- and multicellular eukaryotes we took the number of different cell types of which they are composed. Of 1,219 superfamilies, there are 194 whose sizes in the 38 organisms are strongly correlated with the number of cell types in the organisms. We give outline descriptions of these superfamilies. Half are involved in extracellular processes or regulation and smaller proportions in other types of activity. Half of all superfamilies have no significant correlation with complexity. We also determined whether the expansions of large superfamilies correlate with each other. We found three large clusters of correlated expansions: one involves expansions in both vertebrates and plants, one just in vertebrates, and one just in plants. Our work identifies important protein families and provides one explanation of the discrepancy between the total number of genes and the apparent physiological complexity of eukaryotic organisms.

Original languageEnglish (US)
Pages (from-to)370-382
Number of pages13
JournalPLoS Computational Biology
Volume2
Issue number5
DOIs
StatePublished - May 2006

Fingerprint

Proteins
Protein
protein
Vertebrates
organisms
Cell Count
Gene Duplication
proteins
Genes
Eukaryota
vertebrate
Gene
vertebrates
Cell
Duplication
gene
Correlate
common ancestry
eukaryote
Discrepancy

ASJC Scopus subject areas

  • Cellular and Molecular Neuroscience
  • Ecology
  • Molecular Biology
  • Genetics
  • Ecology, Evolution, Behavior and Systematics
  • Modeling and Simulation
  • Computational Theory and Mathematics

Cite this

Protein family expansions and biological complexity. / Vogel, Christine; Chothia, Cyrus.

In: PLoS Computational Biology, Vol. 2, No. 5, 05.2006, p. 370-382.

Research output: Contribution to journalArticle

Vogel, Christine ; Chothia, Cyrus. / Protein family expansions and biological complexity. In: PLoS Computational Biology. 2006 ; Vol. 2, No. 5. pp. 370-382.
@article{1464932e97e54d4e8a1d3bbd85ee3486,
title = "Protein family expansions and biological complexity",
abstract = "During the course of evolution, new proteins are produced very largely as the result of gene duplication, divergence and, in many cases, combination. This means that proteins or protein domains belong to families or, in cases where their relationships can only be recognised on the basis of structure, superfamilies whose members descended from a common ancestor. The size of superfamilies can vary greatly. Also, during the course of evolution organisms of increasing complexity have arisen. In this paper we determine the identity of those superfamilies whose relative sizes in different organisms are highly correlated to the complexity of the organisms. As a measure of the complexity of 38 uni- and multicellular eukaryotes we took the number of different cell types of which they are composed. Of 1,219 superfamilies, there are 194 whose sizes in the 38 organisms are strongly correlated with the number of cell types in the organisms. We give outline descriptions of these superfamilies. Half are involved in extracellular processes or regulation and smaller proportions in other types of activity. Half of all superfamilies have no significant correlation with complexity. We also determined whether the expansions of large superfamilies correlate with each other. We found three large clusters of correlated expansions: one involves expansions in both vertebrates and plants, one just in vertebrates, and one just in plants. Our work identifies important protein families and provides one explanation of the discrepancy between the total number of genes and the apparent physiological complexity of eukaryotic organisms.",
author = "Christine Vogel and Cyrus Chothia",
year = "2006",
month = "5",
doi = "10.1371/journal.pcbi.0020048",
language = "English (US)",
volume = "2",
pages = "370--382",
journal = "PLoS Computational Biology",
issn = "1553-734X",
publisher = "Public Library of Science",
number = "5",

}

TY - JOUR

T1 - Protein family expansions and biological complexity

AU - Vogel, Christine

AU - Chothia, Cyrus

PY - 2006/5

Y1 - 2006/5

N2 - During the course of evolution, new proteins are produced very largely as the result of gene duplication, divergence and, in many cases, combination. This means that proteins or protein domains belong to families or, in cases where their relationships can only be recognised on the basis of structure, superfamilies whose members descended from a common ancestor. The size of superfamilies can vary greatly. Also, during the course of evolution organisms of increasing complexity have arisen. In this paper we determine the identity of those superfamilies whose relative sizes in different organisms are highly correlated to the complexity of the organisms. As a measure of the complexity of 38 uni- and multicellular eukaryotes we took the number of different cell types of which they are composed. Of 1,219 superfamilies, there are 194 whose sizes in the 38 organisms are strongly correlated with the number of cell types in the organisms. We give outline descriptions of these superfamilies. Half are involved in extracellular processes or regulation and smaller proportions in other types of activity. Half of all superfamilies have no significant correlation with complexity. We also determined whether the expansions of large superfamilies correlate with each other. We found three large clusters of correlated expansions: one involves expansions in both vertebrates and plants, one just in vertebrates, and one just in plants. Our work identifies important protein families and provides one explanation of the discrepancy between the total number of genes and the apparent physiological complexity of eukaryotic organisms.

AB - During the course of evolution, new proteins are produced very largely as the result of gene duplication, divergence and, in many cases, combination. This means that proteins or protein domains belong to families or, in cases where their relationships can only be recognised on the basis of structure, superfamilies whose members descended from a common ancestor. The size of superfamilies can vary greatly. Also, during the course of evolution organisms of increasing complexity have arisen. In this paper we determine the identity of those superfamilies whose relative sizes in different organisms are highly correlated to the complexity of the organisms. As a measure of the complexity of 38 uni- and multicellular eukaryotes we took the number of different cell types of which they are composed. Of 1,219 superfamilies, there are 194 whose sizes in the 38 organisms are strongly correlated with the number of cell types in the organisms. We give outline descriptions of these superfamilies. Half are involved in extracellular processes or regulation and smaller proportions in other types of activity. Half of all superfamilies have no significant correlation with complexity. We also determined whether the expansions of large superfamilies correlate with each other. We found three large clusters of correlated expansions: one involves expansions in both vertebrates and plants, one just in vertebrates, and one just in plants. Our work identifies important protein families and provides one explanation of the discrepancy between the total number of genes and the apparent physiological complexity of eukaryotic organisms.

UR - http://www.scopus.com/inward/record.url?scp=33646909100&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33646909100&partnerID=8YFLogxK

U2 - 10.1371/journal.pcbi.0020048

DO - 10.1371/journal.pcbi.0020048

M3 - Article

VL - 2

SP - 370

EP - 382

JO - PLoS Computational Biology

JF - PLoS Computational Biology

SN - 1553-734X

IS - 5

ER -