Genome-wide motif statistics are shaped by DNA binding proteins over evolutionary time scales

Long Qian, Edo Kussell

Research output: Contribution to journalArticle

Abstract

The composition of a genome with respect to all possible short DNA motifs impacts the ability of DNA binding proteins to locate and bind their target sites. Since nonfunctional DNA binding can be detrimental to cellular functions and ultimately to organismal fitness, organisms could benefit from reducing the number of nonfunctional DNA binding sites genome wide. Using in vitro measurements of binding affinities for a large collection of DNA binding proteins, in multiple species, we detect a significant global avoidance of weak binding sites in genomes. We demonstrate that the underlying evolutionary process leaves a distinct genomic hallmark in that similar words have correlated frequencies, a signal that we detect in all species across domains of life. We consider the possibility that natural selection against weak binding sites contributes to this process, and using an evolutionary model we show that the strength of selection needed to maintain global word compositions is on the order of point mutation rates. Likewise, we show that evolutionary mechanisms based on interference of protein-DNA binding with replication and mutational repair processes could yield similar results and operate with similar rates. On the basis of these modeling and bioinformatic results, we conclude that genome-wide word compositions have been molded by DNA binding proteins acting through tiny evolutionary steps over time scales spanning millions of generations.

Original languageEnglish (US)
Article number041009
JournalPhysical Review X
Volume6
Issue number4
DOIs
StatePublished - 2016

Fingerprint

genome
deoxyribonucleic acid
statistics
proteins
fitness
avoidance
mutations
organisms
leaves
affinity
interference

Keywords

  • Biological Physics
  • Statistical Physics

ASJC Scopus subject areas

  • Physics and Astronomy(all)

Cite this

Genome-wide motif statistics are shaped by DNA binding proteins over evolutionary time scales. / Qian, Long; Kussell, Edo.

In: Physical Review X, Vol. 6, No. 4, 041009, 2016.

Research output: Contribution to journalArticle

@article{143557abb44746a5bedad2c817609e51,
title = "Genome-wide motif statistics are shaped by DNA binding proteins over evolutionary time scales",
abstract = "The composition of a genome with respect to all possible short DNA motifs impacts the ability of DNA binding proteins to locate and bind their target sites. Since nonfunctional DNA binding can be detrimental to cellular functions and ultimately to organismal fitness, organisms could benefit from reducing the number of nonfunctional DNA binding sites genome wide. Using in vitro measurements of binding affinities for a large collection of DNA binding proteins, in multiple species, we detect a significant global avoidance of weak binding sites in genomes. We demonstrate that the underlying evolutionary process leaves a distinct genomic hallmark in that similar words have correlated frequencies, a signal that we detect in all species across domains of life. We consider the possibility that natural selection against weak binding sites contributes to this process, and using an evolutionary model we show that the strength of selection needed to maintain global word compositions is on the order of point mutation rates. Likewise, we show that evolutionary mechanisms based on interference of protein-DNA binding with replication and mutational repair processes could yield similar results and operate with similar rates. On the basis of these modeling and bioinformatic results, we conclude that genome-wide word compositions have been molded by DNA binding proteins acting through tiny evolutionary steps over time scales spanning millions of generations.",
keywords = "Biological Physics, Statistical Physics",
author = "Long Qian and Edo Kussell",
year = "2016",
doi = "10.1103/PhysRevX.6.041009",
language = "English (US)",
volume = "6",
journal = "Physical Review X",
issn = "2160-3308",
number = "4",

}

TY - JOUR

T1 - Genome-wide motif statistics are shaped by DNA binding proteins over evolutionary time scales

AU - Qian, Long

AU - Kussell, Edo

PY - 2016

Y1 - 2016

N2 - The composition of a genome with respect to all possible short DNA motifs impacts the ability of DNA binding proteins to locate and bind their target sites. Since nonfunctional DNA binding can be detrimental to cellular functions and ultimately to organismal fitness, organisms could benefit from reducing the number of nonfunctional DNA binding sites genome wide. Using in vitro measurements of binding affinities for a large collection of DNA binding proteins, in multiple species, we detect a significant global avoidance of weak binding sites in genomes. We demonstrate that the underlying evolutionary process leaves a distinct genomic hallmark in that similar words have correlated frequencies, a signal that we detect in all species across domains of life. We consider the possibility that natural selection against weak binding sites contributes to this process, and using an evolutionary model we show that the strength of selection needed to maintain global word compositions is on the order of point mutation rates. Likewise, we show that evolutionary mechanisms based on interference of protein-DNA binding with replication and mutational repair processes could yield similar results and operate with similar rates. On the basis of these modeling and bioinformatic results, we conclude that genome-wide word compositions have been molded by DNA binding proteins acting through tiny evolutionary steps over time scales spanning millions of generations.

AB - The composition of a genome with respect to all possible short DNA motifs impacts the ability of DNA binding proteins to locate and bind their target sites. Since nonfunctional DNA binding can be detrimental to cellular functions and ultimately to organismal fitness, organisms could benefit from reducing the number of nonfunctional DNA binding sites genome wide. Using in vitro measurements of binding affinities for a large collection of DNA binding proteins, in multiple species, we detect a significant global avoidance of weak binding sites in genomes. We demonstrate that the underlying evolutionary process leaves a distinct genomic hallmark in that similar words have correlated frequencies, a signal that we detect in all species across domains of life. We consider the possibility that natural selection against weak binding sites contributes to this process, and using an evolutionary model we show that the strength of selection needed to maintain global word compositions is on the order of point mutation rates. Likewise, we show that evolutionary mechanisms based on interference of protein-DNA binding with replication and mutational repair processes could yield similar results and operate with similar rates. On the basis of these modeling and bioinformatic results, we conclude that genome-wide word compositions have been molded by DNA binding proteins acting through tiny evolutionary steps over time scales spanning millions of generations.

KW - Biological Physics

KW - Statistical Physics

UR - http://www.scopus.com/inward/record.url?scp=85008152789&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85008152789&partnerID=8YFLogxK

U2 - 10.1103/PhysRevX.6.041009

DO - 10.1103/PhysRevX.6.041009

M3 - Article

VL - 6

JO - Physical Review X

JF - Physical Review X

SN - 2160-3308

IS - 4

M1 - 041009

ER -