Large-scale RACE approach for proactive experimental definition of C. elegans ORFeome

Kourosh Salehi-Ashtiani, Chenwei Lin, Tong Hao, Yun Shen, David Szeto, Xinping Yang, Lila Ghamsari, Hanjoo Lee, Changyu Fan, Ryan R. Murray, Stuart Milstein, Nenad Svrzikapa, Michael E. Cusick, Frederick P. Roth, David E. Hill, Marc Vidal

Research output: Contribution to journalArticle

Abstract

Although a highly accurate sequence of the Caenorhabditis elegans genome has been available for 10 years, the exact transcript structures of many of its protein-coding genes remain unsettled. Approximately two-thirds of the ORFeome has been verified reactively by amplifying and cloning computationally predicted transcript models; still a full third of the ORFeome remains experimentally unverified. To fully identify the protein-coding potential of the worm genome including transcripts that may not satisfy existing heuristics for gene prediction, we developed a computational and experimental platform adapting rapid amplification of cDNA ends (RACE) for large-scale structural transcript annotation. We interrogated 2000 unverified protein-coding genes using this platform. We obtained RACE data for approximately two-thirds of the examined transcripts and reconstructed ORF and transcript models for close to 1000 of these. We defined untranslated regions, identified new exons, and redefined previously annotated exons. Our results show that as much as 20% of the C. elegans genome may be incorrectly annotated. Many annotation errors could be corrected proactively with our large-scale RACE platform.

Original languageEnglish (US)
Pages (from-to)2334-2342
Number of pages9
JournalGenome Research
Volume19
Issue number12
DOIs
StatePublished - Dec 1 2009

Fingerprint

Complementary DNA
Caenorhabditis elegans
Genome
Exons
Untranslated Regions
Proteins
Open Reading Frames
Organism Cloning
Genes
Heuristics

ASJC Scopus subject areas

  • Genetics
  • Genetics(clinical)

Cite this

Large-scale RACE approach for proactive experimental definition of C. elegans ORFeome. / Salehi-Ashtiani, Kourosh; Lin, Chenwei; Hao, Tong; Shen, Yun; Szeto, David; Yang, Xinping; Ghamsari, Lila; Lee, Hanjoo; Fan, Changyu; Murray, Ryan R.; Milstein, Stuart; Svrzikapa, Nenad; Cusick, Michael E.; Roth, Frederick P.; Hill, David E.; Vidal, Marc.

In: Genome Research, Vol. 19, No. 12, 01.12.2009, p. 2334-2342.

Research output: Contribution to journalArticle

Salehi-Ashtiani, K, Lin, C, Hao, T, Shen, Y, Szeto, D, Yang, X, Ghamsari, L, Lee, H, Fan, C, Murray, RR, Milstein, S, Svrzikapa, N, Cusick, ME, Roth, FP, Hill, DE & Vidal, M 2009, 'Large-scale RACE approach for proactive experimental definition of C. elegans ORFeome', Genome Research, vol. 19, no. 12, pp. 2334-2342. https://doi.org/10.1101/gr.098640.109
Salehi-Ashtiani, Kourosh ; Lin, Chenwei ; Hao, Tong ; Shen, Yun ; Szeto, David ; Yang, Xinping ; Ghamsari, Lila ; Lee, Hanjoo ; Fan, Changyu ; Murray, Ryan R. ; Milstein, Stuart ; Svrzikapa, Nenad ; Cusick, Michael E. ; Roth, Frederick P. ; Hill, David E. ; Vidal, Marc. / Large-scale RACE approach for proactive experimental definition of C. elegans ORFeome. In: Genome Research. 2009 ; Vol. 19, No. 12. pp. 2334-2342.
@article{748f24d0b1ad41ba994854097f1b0e53,
title = "Large-scale RACE approach for proactive experimental definition of C. elegans ORFeome",
abstract = "Although a highly accurate sequence of the Caenorhabditis elegans genome has been available for 10 years, the exact transcript structures of many of its protein-coding genes remain unsettled. Approximately two-thirds of the ORFeome has been verified reactively by amplifying and cloning computationally predicted transcript models; still a full third of the ORFeome remains experimentally unverified. To fully identify the protein-coding potential of the worm genome including transcripts that may not satisfy existing heuristics for gene prediction, we developed a computational and experimental platform adapting rapid amplification of cDNA ends (RACE) for large-scale structural transcript annotation. We interrogated 2000 unverified protein-coding genes using this platform. We obtained RACE data for approximately two-thirds of the examined transcripts and reconstructed ORF and transcript models for close to 1000 of these. We defined untranslated regions, identified new exons, and redefined previously annotated exons. Our results show that as much as 20{\%} of the C. elegans genome may be incorrectly annotated. Many annotation errors could be corrected proactively with our large-scale RACE platform.",
author = "Kourosh Salehi-Ashtiani and Chenwei Lin and Tong Hao and Yun Shen and David Szeto and Xinping Yang and Lila Ghamsari and Hanjoo Lee and Changyu Fan and Murray, {Ryan R.} and Stuart Milstein and Nenad Svrzikapa and Cusick, {Michael E.} and Roth, {Frederick P.} and Hill, {David E.} and Marc Vidal",
year = "2009",
month = "12",
day = "1",
doi = "10.1101/gr.098640.109",
language = "English (US)",
volume = "19",
pages = "2334--2342",
journal = "Genome Research",
issn = "1088-9051",
publisher = "Cold Spring Harbor Laboratory Press",
number = "12",

}

TY - JOUR

T1 - Large-scale RACE approach for proactive experimental definition of C. elegans ORFeome

AU - Salehi-Ashtiani, Kourosh

AU - Lin, Chenwei

AU - Hao, Tong

AU - Shen, Yun

AU - Szeto, David

AU - Yang, Xinping

AU - Ghamsari, Lila

AU - Lee, Hanjoo

AU - Fan, Changyu

AU - Murray, Ryan R.

AU - Milstein, Stuart

AU - Svrzikapa, Nenad

AU - Cusick, Michael E.

AU - Roth, Frederick P.

AU - Hill, David E.

AU - Vidal, Marc

PY - 2009/12/1

Y1 - 2009/12/1

N2 - Although a highly accurate sequence of the Caenorhabditis elegans genome has been available for 10 years, the exact transcript structures of many of its protein-coding genes remain unsettled. Approximately two-thirds of the ORFeome has been verified reactively by amplifying and cloning computationally predicted transcript models; still a full third of the ORFeome remains experimentally unverified. To fully identify the protein-coding potential of the worm genome including transcripts that may not satisfy existing heuristics for gene prediction, we developed a computational and experimental platform adapting rapid amplification of cDNA ends (RACE) for large-scale structural transcript annotation. We interrogated 2000 unverified protein-coding genes using this platform. We obtained RACE data for approximately two-thirds of the examined transcripts and reconstructed ORF and transcript models for close to 1000 of these. We defined untranslated regions, identified new exons, and redefined previously annotated exons. Our results show that as much as 20% of the C. elegans genome may be incorrectly annotated. Many annotation errors could be corrected proactively with our large-scale RACE platform.

AB - Although a highly accurate sequence of the Caenorhabditis elegans genome has been available for 10 years, the exact transcript structures of many of its protein-coding genes remain unsettled. Approximately two-thirds of the ORFeome has been verified reactively by amplifying and cloning computationally predicted transcript models; still a full third of the ORFeome remains experimentally unverified. To fully identify the protein-coding potential of the worm genome including transcripts that may not satisfy existing heuristics for gene prediction, we developed a computational and experimental platform adapting rapid amplification of cDNA ends (RACE) for large-scale structural transcript annotation. We interrogated 2000 unverified protein-coding genes using this platform. We obtained RACE data for approximately two-thirds of the examined transcripts and reconstructed ORF and transcript models for close to 1000 of these. We defined untranslated regions, identified new exons, and redefined previously annotated exons. Our results show that as much as 20% of the C. elegans genome may be incorrectly annotated. Many annotation errors could be corrected proactively with our large-scale RACE platform.

UR - http://www.scopus.com/inward/record.url?scp=73249134953&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=73249134953&partnerID=8YFLogxK

U2 - 10.1101/gr.098640.109

DO - 10.1101/gr.098640.109

M3 - Article

VL - 19

SP - 2334

EP - 2342

JO - Genome Research

JF - Genome Research

SN - 1088-9051

IS - 12

ER -