Evolutionary pressures on simple sequence repeats in prokaryotic coding regions

Wei Hsiang Lin, Edo Kussell

Research output: Contribution to journalArticle

Abstract

Simple sequence repeats (SSRs) are indel mutational hotspots in genomes. In prokaryotes, SSR loci can cause phase variation, a microbial survival strategy that relies on stochastic, reversible on-off switching of gene activity. By analyzing multiple strains of 42 fully sequenced prokaryotic species, we measure the relative variability and density distribution of SSRs in coding regions. We demonstrate that repeat type strongly influences indel mutation rates, and that the most mutable types are most strongly avoided across genomes. We thoroughly characterize SSR density and variability as a function of N→C position along protein sequences. Using codon-shuffling algorithms that preserve amino acid sequence, we assess evolutionary pressures on SSRs. We find that coding sequences suppress repeats in the middle of proteins, and enrich repeats near termini, yielding U-shaped SSR density curves. We show that for many species this characteristic shape can be attributed to purely biophysical constraints of protein structure. In multiple cases, however, particularly in certain pathogenic bacteria, we observe over enrichment of SSRs near protein N-termini significantly beyond expectation based on structural constraints. This increases the probability that frameshifts result in non-functional proteins, revealing that these species may evolutionarily tune SSR positions in coding regions to facilitate phase variation.

Original languageEnglish (US)
Pages (from-to)2399-2413
Number of pages15
JournalNucleic Acids Research
Volume40
Issue number6
DOIs
StatePublished - Mar 2012

Fingerprint

Microsatellite Repeats
Pressure
Proteins
Genome
Specific Gravity
Mutation Rate
Codon
Amino Acid Sequence
Bacteria
Genes

ASJC Scopus subject areas

  • Genetics

Cite this

Evolutionary pressures on simple sequence repeats in prokaryotic coding regions. / Lin, Wei Hsiang; Kussell, Edo.

In: Nucleic Acids Research, Vol. 40, No. 6, 03.2012, p. 2399-2413.

Research output: Contribution to journalArticle

@article{8c43f0ee551f4cf1b2c8e7a9b778f6fe,
title = "Evolutionary pressures on simple sequence repeats in prokaryotic coding regions",
abstract = "Simple sequence repeats (SSRs) are indel mutational hotspots in genomes. In prokaryotes, SSR loci can cause phase variation, a microbial survival strategy that relies on stochastic, reversible on-off switching of gene activity. By analyzing multiple strains of 42 fully sequenced prokaryotic species, we measure the relative variability and density distribution of SSRs in coding regions. We demonstrate that repeat type strongly influences indel mutation rates, and that the most mutable types are most strongly avoided across genomes. We thoroughly characterize SSR density and variability as a function of N→C position along protein sequences. Using codon-shuffling algorithms that preserve amino acid sequence, we assess evolutionary pressures on SSRs. We find that coding sequences suppress repeats in the middle of proteins, and enrich repeats near termini, yielding U-shaped SSR density curves. We show that for many species this characteristic shape can be attributed to purely biophysical constraints of protein structure. In multiple cases, however, particularly in certain pathogenic bacteria, we observe over enrichment of SSRs near protein N-termini significantly beyond expectation based on structural constraints. This increases the probability that frameshifts result in non-functional proteins, revealing that these species may evolutionarily tune SSR positions in coding regions to facilitate phase variation.",
author = "Lin, {Wei Hsiang} and Edo Kussell",
year = "2012",
month = "3",
doi = "10.1093/nar/gkr1078",
language = "English (US)",
volume = "40",
pages = "2399--2413",
journal = "Nucleic Acids Research",
issn = "0305-1048",
publisher = "Oxford University Press",
number = "6",

}

TY - JOUR

T1 - Evolutionary pressures on simple sequence repeats in prokaryotic coding regions

AU - Lin, Wei Hsiang

AU - Kussell, Edo

PY - 2012/3

Y1 - 2012/3

N2 - Simple sequence repeats (SSRs) are indel mutational hotspots in genomes. In prokaryotes, SSR loci can cause phase variation, a microbial survival strategy that relies on stochastic, reversible on-off switching of gene activity. By analyzing multiple strains of 42 fully sequenced prokaryotic species, we measure the relative variability and density distribution of SSRs in coding regions. We demonstrate that repeat type strongly influences indel mutation rates, and that the most mutable types are most strongly avoided across genomes. We thoroughly characterize SSR density and variability as a function of N→C position along protein sequences. Using codon-shuffling algorithms that preserve amino acid sequence, we assess evolutionary pressures on SSRs. We find that coding sequences suppress repeats in the middle of proteins, and enrich repeats near termini, yielding U-shaped SSR density curves. We show that for many species this characteristic shape can be attributed to purely biophysical constraints of protein structure. In multiple cases, however, particularly in certain pathogenic bacteria, we observe over enrichment of SSRs near protein N-termini significantly beyond expectation based on structural constraints. This increases the probability that frameshifts result in non-functional proteins, revealing that these species may evolutionarily tune SSR positions in coding regions to facilitate phase variation.

AB - Simple sequence repeats (SSRs) are indel mutational hotspots in genomes. In prokaryotes, SSR loci can cause phase variation, a microbial survival strategy that relies on stochastic, reversible on-off switching of gene activity. By analyzing multiple strains of 42 fully sequenced prokaryotic species, we measure the relative variability and density distribution of SSRs in coding regions. We demonstrate that repeat type strongly influences indel mutation rates, and that the most mutable types are most strongly avoided across genomes. We thoroughly characterize SSR density and variability as a function of N→C position along protein sequences. Using codon-shuffling algorithms that preserve amino acid sequence, we assess evolutionary pressures on SSRs. We find that coding sequences suppress repeats in the middle of proteins, and enrich repeats near termini, yielding U-shaped SSR density curves. We show that for many species this characteristic shape can be attributed to purely biophysical constraints of protein structure. In multiple cases, however, particularly in certain pathogenic bacteria, we observe over enrichment of SSRs near protein N-termini significantly beyond expectation based on structural constraints. This increases the probability that frameshifts result in non-functional proteins, revealing that these species may evolutionarily tune SSR positions in coding regions to facilitate phase variation.

UR - http://www.scopus.com/inward/record.url?scp=84859323535&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84859323535&partnerID=8YFLogxK

U2 - 10.1093/nar/gkr1078

DO - 10.1093/nar/gkr1078

M3 - Article

C2 - 22123746

AN - SCOPUS:84859323535

VL - 40

SP - 2399

EP - 2413

JO - Nucleic Acids Research

JF - Nucleic Acids Research

SN - 0305-1048

IS - 6

ER -