Homology search for genes

Xuefeng Cui, Tomáš Vinař, Brońa Brejová, Dennis Shasha, Ming Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Motivation: Life science researchers often require an exhaustive list of protein coding genes similar to a given query gene. To find such genes, homology search tools, such as BLAST or PatternHunter, return a set of high-scoring pairs (HSPs). These HSPs then need to be correlated with existing sequence annotations, or assembled manually into putative gene structures. This process is error-prone and labor-intensive, especially in genomes without reliable gene annotation. Results: We have developed a homology search solution that automates this process, and instead of HSPs returns complete gene structures. We achieve better sensitivity and specificity by adapting a hidden Markov model for gene finding to reflect features of the query gene. Compared to traditional homology search, our novel approach identifies splice sites much more reliably and can even locate exons that were lost in the query gene. On a testing set of 400 mouse query genes, we report 79% exon sensitivity and 80% exon specificity in the human genome based on orthologous genes annotated in NCBI HomoloGene. In the same set, we also found 50 (12%) gene structures with better protein alignment scores than the ones identified in HomoloGene. © 2007 The Author(s).

Original languageEnglish (US)
Title of host publicationISMB/ECCB 2007
DOIs
StatePublished - 2007

Fingerprint

Homology
Genes
Gene
Exons
Scoring
Query
Specificity
Annotation
Genome
Molecular Sequence Annotation
Biological Science Disciplines
Protein
Human Genome
Proteins
Life sciences
Markov Model
Hidden Markov models
Research Personnel
Mouse
Alignment

ASJC Scopus subject areas

  • Clinical Biochemistry
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Biochemistry
  • Molecular Biology
  • Computational Mathematics
  • Statistics and Probability

Cite this

Cui, X., Vinař, T., Brejová, B., Shasha, D., & Li, M. (2007). Homology search for genes. In ISMB/ECCB 2007 https://doi.org/10.1093/bioinformatics/btm225

Homology search for genes. / Cui, Xuefeng; Vinař, Tomáš; Brejová, Brońa; Shasha, Dennis; Li, Ming.

ISMB/ECCB 2007. 2007.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cui, X, Vinař, T, Brejová, B, Shasha, D & Li, M 2007, Homology search for genes. in ISMB/ECCB 2007. https://doi.org/10.1093/bioinformatics/btm225
Cui X, Vinař T, Brejová B, Shasha D, Li M. Homology search for genes. In ISMB/ECCB 2007. 2007 https://doi.org/10.1093/bioinformatics/btm225
Cui, Xuefeng ; Vinař, Tomáš ; Brejová, Brońa ; Shasha, Dennis ; Li, Ming. / Homology search for genes. ISMB/ECCB 2007. 2007.
@inproceedings{36a5a2d190b44d248d38fa8ea21644a6,
title = "Homology search for genes",
abstract = "Motivation: Life science researchers often require an exhaustive list of protein coding genes similar to a given query gene. To find such genes, homology search tools, such as BLAST or PatternHunter, return a set of high-scoring pairs (HSPs). These HSPs then need to be correlated with existing sequence annotations, or assembled manually into putative gene structures. This process is error-prone and labor-intensive, especially in genomes without reliable gene annotation. Results: We have developed a homology search solution that automates this process, and instead of HSPs returns complete gene structures. We achieve better sensitivity and specificity by adapting a hidden Markov model for gene finding to reflect features of the query gene. Compared to traditional homology search, our novel approach identifies splice sites much more reliably and can even locate exons that were lost in the query gene. On a testing set of 400 mouse query genes, we report 79{\%} exon sensitivity and 80{\%} exon specificity in the human genome based on orthologous genes annotated in NCBI HomoloGene. In the same set, we also found 50 (12{\%}) gene structures with better protein alignment scores than the ones identified in HomoloGene. {\circledC} 2007 The Author(s).",
author = "Xuefeng Cui and Tom{\'a}š Vinař and Brońa Brejov{\'a} and Dennis Shasha and Ming Li",
year = "2007",
doi = "10.1093/bioinformatics/btm225",
language = "English (US)",
booktitle = "ISMB/ECCB 2007",

}

TY - GEN

T1 - Homology search for genes

AU - Cui, Xuefeng

AU - Vinař, Tomáš

AU - Brejová, Brońa

AU - Shasha, Dennis

AU - Li, Ming

PY - 2007

Y1 - 2007

N2 - Motivation: Life science researchers often require an exhaustive list of protein coding genes similar to a given query gene. To find such genes, homology search tools, such as BLAST or PatternHunter, return a set of high-scoring pairs (HSPs). These HSPs then need to be correlated with existing sequence annotations, or assembled manually into putative gene structures. This process is error-prone and labor-intensive, especially in genomes without reliable gene annotation. Results: We have developed a homology search solution that automates this process, and instead of HSPs returns complete gene structures. We achieve better sensitivity and specificity by adapting a hidden Markov model for gene finding to reflect features of the query gene. Compared to traditional homology search, our novel approach identifies splice sites much more reliably and can even locate exons that were lost in the query gene. On a testing set of 400 mouse query genes, we report 79% exon sensitivity and 80% exon specificity in the human genome based on orthologous genes annotated in NCBI HomoloGene. In the same set, we also found 50 (12%) gene structures with better protein alignment scores than the ones identified in HomoloGene. © 2007 The Author(s).

AB - Motivation: Life science researchers often require an exhaustive list of protein coding genes similar to a given query gene. To find such genes, homology search tools, such as BLAST or PatternHunter, return a set of high-scoring pairs (HSPs). These HSPs then need to be correlated with existing sequence annotations, or assembled manually into putative gene structures. This process is error-prone and labor-intensive, especially in genomes without reliable gene annotation. Results: We have developed a homology search solution that automates this process, and instead of HSPs returns complete gene structures. We achieve better sensitivity and specificity by adapting a hidden Markov model for gene finding to reflect features of the query gene. Compared to traditional homology search, our novel approach identifies splice sites much more reliably and can even locate exons that were lost in the query gene. On a testing set of 400 mouse query genes, we report 79% exon sensitivity and 80% exon specificity in the human genome based on orthologous genes annotated in NCBI HomoloGene. In the same set, we also found 50 (12%) gene structures with better protein alignment scores than the ones identified in HomoloGene. © 2007 The Author(s).

UR - http://www.scopus.com/inward/record.url?scp=34547852257&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34547852257&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btm225

DO - 10.1093/bioinformatics/btm225

M3 - Conference contribution

BT - ISMB/ECCB 2007

ER -