OrthologID: Automation of genome-scale ortholog identification within a parsimony framework

Joanna C. Chiu, Ernest K. Lee, Mary G. Egan, Indra Neil Sarkar, Gloria M. Coruzzi, Rob DeSalle

Research output: Contribution to journalArticle

Abstract

Motivation: The determination of gene orthology is a prerequisite for mining and utilizing the rapidly increasing amount of sequence data for genome-scale phylogenetics and comparative genomic studies. Until now, most researchers use pairwise distance comparisons algorithms, such as BLAST, COG, RBH, RSD and INPARANOID, to determine gene orthology. In contrast, orthology determination within a character-based phylogenetic framework has not been utilized on a genomic scale owing to the lack of efficiency and automation. Results: We have developed OrthologID, a Web application that automates the labor-intensive procedures of gene orthology determination within a character-based phylogenetic framework, thus making character-based orthology determination on a genomic scale possible. In addition to generating gene family trees and determining orthologous gene sets for complete genomes, OrthologID can also identify diagnostic characters that define each orthologous gene set, as well as diagnostic characters that are responsible for classifying query sequences from other genomes into specific orthology groups. The OrthologID database currently includes several complete plant genomes, including Arabidopsis thaliana, Oryza sativa, Populus trichocarpa, as well as a unicellular outgroup, Chlamydomonas reinhardtii. To improve the general utility of OrthologID beyond plant species, we plan to expand our sequence database to include the fully sequenced genomes of prokaryotes and other non-plant eukaryotes.

Original languageEnglish (US)
Pages (from-to)699-707
Number of pages9
JournalBioinformatics
Volume22
Issue number6
DOIs
StatePublished - Mar 15 2006

Fingerprint

Parsimony
Automation
Identification (control systems)
Genome
Genes
Gene
Phylogenetics
Genomics
Diagnostics
Oryza Sativa
Chlamydomonas Reinhardtii
Databases
Populus
Plant Genome
Chlamydomonas reinhardtii
Arabidopsis Thaliana
Comparative Genomics
Pedigree
Web Application
Eukaryota

ASJC Scopus subject areas

  • Clinical Biochemistry
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

OrthologID : Automation of genome-scale ortholog identification within a parsimony framework. / Chiu, Joanna C.; Lee, Ernest K.; Egan, Mary G.; Sarkar, Indra Neil; Coruzzi, Gloria M.; DeSalle, Rob.

In: Bioinformatics, Vol. 22, No. 6, 15.03.2006, p. 699-707.

Research output: Contribution to journalArticle

Chiu, Joanna C. ; Lee, Ernest K. ; Egan, Mary G. ; Sarkar, Indra Neil ; Coruzzi, Gloria M. ; DeSalle, Rob. / OrthologID : Automation of genome-scale ortholog identification within a parsimony framework. In: Bioinformatics. 2006 ; Vol. 22, No. 6. pp. 699-707.
@article{10a5a009830544d295fd8e2d34840aa7,
title = "OrthologID: Automation of genome-scale ortholog identification within a parsimony framework",
abstract = "Motivation: The determination of gene orthology is a prerequisite for mining and utilizing the rapidly increasing amount of sequence data for genome-scale phylogenetics and comparative genomic studies. Until now, most researchers use pairwise distance comparisons algorithms, such as BLAST, COG, RBH, RSD and INPARANOID, to determine gene orthology. In contrast, orthology determination within a character-based phylogenetic framework has not been utilized on a genomic scale owing to the lack of efficiency and automation. Results: We have developed OrthologID, a Web application that automates the labor-intensive procedures of gene orthology determination within a character-based phylogenetic framework, thus making character-based orthology determination on a genomic scale possible. In addition to generating gene family trees and determining orthologous gene sets for complete genomes, OrthologID can also identify diagnostic characters that define each orthologous gene set, as well as diagnostic characters that are responsible for classifying query sequences from other genomes into specific orthology groups. The OrthologID database currently includes several complete plant genomes, including Arabidopsis thaliana, Oryza sativa, Populus trichocarpa, as well as a unicellular outgroup, Chlamydomonas reinhardtii. To improve the general utility of OrthologID beyond plant species, we plan to expand our sequence database to include the fully sequenced genomes of prokaryotes and other non-plant eukaryotes.",
author = "Chiu, {Joanna C.} and Lee, {Ernest K.} and Egan, {Mary G.} and Sarkar, {Indra Neil} and Coruzzi, {Gloria M.} and Rob DeSalle",
year = "2006",
month = "3",
day = "15",
doi = "10.1093/bioinformatics/btk040",
language = "English (US)",
volume = "22",
pages = "699--707",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "6",

}

TY - JOUR

T1 - OrthologID

T2 - Automation of genome-scale ortholog identification within a parsimony framework

AU - Chiu, Joanna C.

AU - Lee, Ernest K.

AU - Egan, Mary G.

AU - Sarkar, Indra Neil

AU - Coruzzi, Gloria M.

AU - DeSalle, Rob

PY - 2006/3/15

Y1 - 2006/3/15

N2 - Motivation: The determination of gene orthology is a prerequisite for mining and utilizing the rapidly increasing amount of sequence data for genome-scale phylogenetics and comparative genomic studies. Until now, most researchers use pairwise distance comparisons algorithms, such as BLAST, COG, RBH, RSD and INPARANOID, to determine gene orthology. In contrast, orthology determination within a character-based phylogenetic framework has not been utilized on a genomic scale owing to the lack of efficiency and automation. Results: We have developed OrthologID, a Web application that automates the labor-intensive procedures of gene orthology determination within a character-based phylogenetic framework, thus making character-based orthology determination on a genomic scale possible. In addition to generating gene family trees and determining orthologous gene sets for complete genomes, OrthologID can also identify diagnostic characters that define each orthologous gene set, as well as diagnostic characters that are responsible for classifying query sequences from other genomes into specific orthology groups. The OrthologID database currently includes several complete plant genomes, including Arabidopsis thaliana, Oryza sativa, Populus trichocarpa, as well as a unicellular outgroup, Chlamydomonas reinhardtii. To improve the general utility of OrthologID beyond plant species, we plan to expand our sequence database to include the fully sequenced genomes of prokaryotes and other non-plant eukaryotes.

AB - Motivation: The determination of gene orthology is a prerequisite for mining and utilizing the rapidly increasing amount of sequence data for genome-scale phylogenetics and comparative genomic studies. Until now, most researchers use pairwise distance comparisons algorithms, such as BLAST, COG, RBH, RSD and INPARANOID, to determine gene orthology. In contrast, orthology determination within a character-based phylogenetic framework has not been utilized on a genomic scale owing to the lack of efficiency and automation. Results: We have developed OrthologID, a Web application that automates the labor-intensive procedures of gene orthology determination within a character-based phylogenetic framework, thus making character-based orthology determination on a genomic scale possible. In addition to generating gene family trees and determining orthologous gene sets for complete genomes, OrthologID can also identify diagnostic characters that define each orthologous gene set, as well as diagnostic characters that are responsible for classifying query sequences from other genomes into specific orthology groups. The OrthologID database currently includes several complete plant genomes, including Arabidopsis thaliana, Oryza sativa, Populus trichocarpa, as well as a unicellular outgroup, Chlamydomonas reinhardtii. To improve the general utility of OrthologID beyond plant species, we plan to expand our sequence database to include the fully sequenced genomes of prokaryotes and other non-plant eukaryotes.

UR - http://www.scopus.com/inward/record.url?scp=33645106540&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33645106540&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btk040

DO - 10.1093/bioinformatics/btk040

M3 - Article

C2 - 16410324

AN - SCOPUS:33645106540

VL - 22

SP - 699

EP - 707

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 6

ER -