Computational generation and screening of RNA motifs in large nucleotide sequence pools

Namhee Kim, Joseph A. Izzo, Shereef Elmetwaly, Hin Hark Gan, Tamar Schlick

Research output: Contribution to journalArticle

Abstract

Although identification of active motifs in large random sequence pools is central to RNA in vitro selection, no systematic computational equivalent of this process has yet been developed. We develop a computational approach that combines target pool generation, motif scanning and motif screening using secondary structure analysis for applications to 1012-1014-sequence pools; large pool sizes are made possible using program redesign and supercomputing resources. We use the new protocol to search for aptamer and ribozyme motifs in pools up to experimental pool size (1014 sequences). We show that motif scanning, structure matching and flanking sequence analysis, respectively, reduce the initial sequence pool by 6-8, 1-2 and 1 orders of magnitude, consistent with the rare occurrence of active motifs in random pools. The final yields match the theoretical yields from probability theory for simple motifs and overestimate experimental yields, which constitute lower bounds, for aptamers because screening analyses beyond secondary structure information are not considered systematically. We also show that designed pools using our nucleotide transition probability matrices can produce higher yields for RNA ligase motifs than random pools. Our methods for generating, analyzing and designing large pools can help improve RNA design via simulation of aspects of in vitro selection.

Original languageEnglish (US)
Article numbergkq282
JournalNucleic Acids Research
Volume38
Issue number13
DOIs
StatePublished - May 6 2010

Fingerprint

Nucleotide Motifs
Probability Theory
RNA
Catalytic RNA
Ligases
Sequence Analysis
Nucleotides
In Vitro Techniques

ASJC Scopus subject areas

  • Genetics

Cite this

Computational generation and screening of RNA motifs in large nucleotide sequence pools. / Kim, Namhee; Izzo, Joseph A.; Elmetwaly, Shereef; Gan, Hin Hark; Schlick, Tamar.

In: Nucleic Acids Research, Vol. 38, No. 13, gkq282, 06.05.2010.

Research output: Contribution to journalArticle

Kim, Namhee ; Izzo, Joseph A. ; Elmetwaly, Shereef ; Gan, Hin Hark ; Schlick, Tamar. / Computational generation and screening of RNA motifs in large nucleotide sequence pools. In: Nucleic Acids Research. 2010 ; Vol. 38, No. 13.
@article{7f0d5d3b99bd425d80c92fda5c3d8c06,
title = "Computational generation and screening of RNA motifs in large nucleotide sequence pools",
abstract = "Although identification of active motifs in large random sequence pools is central to RNA in vitro selection, no systematic computational equivalent of this process has yet been developed. We develop a computational approach that combines target pool generation, motif scanning and motif screening using secondary structure analysis for applications to 1012-1014-sequence pools; large pool sizes are made possible using program redesign and supercomputing resources. We use the new protocol to search for aptamer and ribozyme motifs in pools up to experimental pool size (1014 sequences). We show that motif scanning, structure matching and flanking sequence analysis, respectively, reduce the initial sequence pool by 6-8, 1-2 and 1 orders of magnitude, consistent with the rare occurrence of active motifs in random pools. The final yields match the theoretical yields from probability theory for simple motifs and overestimate experimental yields, which constitute lower bounds, for aptamers because screening analyses beyond secondary structure information are not considered systematically. We also show that designed pools using our nucleotide transition probability matrices can produce higher yields for RNA ligase motifs than random pools. Our methods for generating, analyzing and designing large pools can help improve RNA design via simulation of aspects of in vitro selection.",
author = "Namhee Kim and Izzo, {Joseph A.} and Shereef Elmetwaly and Gan, {Hin Hark} and Tamar Schlick",
year = "2010",
month = "5",
day = "6",
doi = "10.1093/nar/gkq282",
language = "English (US)",
volume = "38",
journal = "Nucleic Acids Research",
issn = "0305-1048",
publisher = "Oxford University Press",
number = "13",

}

TY - JOUR

T1 - Computational generation and screening of RNA motifs in large nucleotide sequence pools

AU - Kim, Namhee

AU - Izzo, Joseph A.

AU - Elmetwaly, Shereef

AU - Gan, Hin Hark

AU - Schlick, Tamar

PY - 2010/5/6

Y1 - 2010/5/6

N2 - Although identification of active motifs in large random sequence pools is central to RNA in vitro selection, no systematic computational equivalent of this process has yet been developed. We develop a computational approach that combines target pool generation, motif scanning and motif screening using secondary structure analysis for applications to 1012-1014-sequence pools; large pool sizes are made possible using program redesign and supercomputing resources. We use the new protocol to search for aptamer and ribozyme motifs in pools up to experimental pool size (1014 sequences). We show that motif scanning, structure matching and flanking sequence analysis, respectively, reduce the initial sequence pool by 6-8, 1-2 and 1 orders of magnitude, consistent with the rare occurrence of active motifs in random pools. The final yields match the theoretical yields from probability theory for simple motifs and overestimate experimental yields, which constitute lower bounds, for aptamers because screening analyses beyond secondary structure information are not considered systematically. We also show that designed pools using our nucleotide transition probability matrices can produce higher yields for RNA ligase motifs than random pools. Our methods for generating, analyzing and designing large pools can help improve RNA design via simulation of aspects of in vitro selection.

AB - Although identification of active motifs in large random sequence pools is central to RNA in vitro selection, no systematic computational equivalent of this process has yet been developed. We develop a computational approach that combines target pool generation, motif scanning and motif screening using secondary structure analysis for applications to 1012-1014-sequence pools; large pool sizes are made possible using program redesign and supercomputing resources. We use the new protocol to search for aptamer and ribozyme motifs in pools up to experimental pool size (1014 sequences). We show that motif scanning, structure matching and flanking sequence analysis, respectively, reduce the initial sequence pool by 6-8, 1-2 and 1 orders of magnitude, consistent with the rare occurrence of active motifs in random pools. The final yields match the theoretical yields from probability theory for simple motifs and overestimate experimental yields, which constitute lower bounds, for aptamers because screening analyses beyond secondary structure information are not considered systematically. We also show that designed pools using our nucleotide transition probability matrices can produce higher yields for RNA ligase motifs than random pools. Our methods for generating, analyzing and designing large pools can help improve RNA design via simulation of aspects of in vitro selection.

UR - http://www.scopus.com/inward/record.url?scp=77954356679&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77954356679&partnerID=8YFLogxK

U2 - 10.1093/nar/gkq282

DO - 10.1093/nar/gkq282

M3 - Article

VL - 38

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 0305-1048

IS - 13

M1 - gkq282

ER -