Correlated Mutations and Homologous Recombination Within Bacterial Populations

Mingzhi Lin, Edo Kussell

Research output: Contribution to journalArticle

Abstract

Inferring the rate of homologous recombination within a bacterial population remains a key challenge in quantifying the basic parameters of bacterial evolution. Due to the high sequence similarity within a clonal population, and unique aspects of bacterial DNA transfer processes, detecting recombination events based on phylogenetic reconstruction is often difficult, and estimating recombination rates using coalescent model-based methods is computationally expensive, and often infeasible for large sequencing data sets. Here, we present an efficient solution by introducing a set of mutational correlation functions computed using pairwise sequence comparison, which characterize various facets of bacterial recombination. We provide analytical expressions for these functions, which precisely recapitulate simulation results of neutral and adapting populations under different coalescent models. We used these to fit correlation functions measured at synonymous substitutions using whole-genome data on Escherichia coli and Streptococcus pneumoniae populations. We calculated and corrected for the effect of sample selection bias, i.e., the uneven sampling of individuals from natural microbial populations that exists in most datasets. Our method is fast and efficient, and does not employ phylogenetic inference or other computationally intensive numerics. By simply fitting analytical forms to measurements from sequence data, we show that recombination rates can be inferred, and the relative ages of different samples can be estimated. Our approach, which is based on population genetic modeling, is broadly applicable to a wide variety of data, and its computational efficiency makes it particularly attractive for use in the analysis of large sequencing datasets.

Original languageEnglish (US)
Pages (from-to)891-917
Number of pages27
JournalGenetics
Volume205
Issue number2
DOIs
StatePublished - Feb 1 2017

Fingerprint

Homologous Recombination
Genetic Recombination
Mutation
Population
Bacterial DNA
Selection Bias
Population Genetics
Streptococcus pneumoniae
Genome
Escherichia coli
Datasets

Keywords

  • adapting populations
  • bacteria
  • Bolthausen–Sznitman coalescent
  • homologous recombination
  • population diversity
  • sample ages
  • sample selection bias

ASJC Scopus subject areas

  • Genetics

Cite this

Correlated Mutations and Homologous Recombination Within Bacterial Populations. / Lin, Mingzhi; Kussell, Edo.

In: Genetics, Vol. 205, No. 2, 01.02.2017, p. 891-917.

Research output: Contribution to journalArticle

@article{84a6d85acea84c199f06c0545be834f0,
title = "Correlated Mutations and Homologous Recombination Within Bacterial Populations",
abstract = "Inferring the rate of homologous recombination within a bacterial population remains a key challenge in quantifying the basic parameters of bacterial evolution. Due to the high sequence similarity within a clonal population, and unique aspects of bacterial DNA transfer processes, detecting recombination events based on phylogenetic reconstruction is often difficult, and estimating recombination rates using coalescent model-based methods is computationally expensive, and often infeasible for large sequencing data sets. Here, we present an efficient solution by introducing a set of mutational correlation functions computed using pairwise sequence comparison, which characterize various facets of bacterial recombination. We provide analytical expressions for these functions, which precisely recapitulate simulation results of neutral and adapting populations under different coalescent models. We used these to fit correlation functions measured at synonymous substitutions using whole-genome data on Escherichia coli and Streptococcus pneumoniae populations. We calculated and corrected for the effect of sample selection bias, i.e., the uneven sampling of individuals from natural microbial populations that exists in most datasets. Our method is fast and efficient, and does not employ phylogenetic inference or other computationally intensive numerics. By simply fitting analytical forms to measurements from sequence data, we show that recombination rates can be inferred, and the relative ages of different samples can be estimated. Our approach, which is based on population genetic modeling, is broadly applicable to a wide variety of data, and its computational efficiency makes it particularly attractive for use in the analysis of large sequencing datasets.",
keywords = "adapting populations, bacteria, Bolthausen–Sznitman coalescent, homologous recombination, population diversity, sample ages, sample selection bias",
author = "Mingzhi Lin and Edo Kussell",
year = "2017",
month = "2",
day = "1",
doi = "10.1534/genetics.116.189621",
language = "English (US)",
volume = "205",
pages = "891--917",
journal = "Genetics",
issn = "0016-6731",
publisher = "Genetics Society of America",
number = "2",

}

TY - JOUR

T1 - Correlated Mutations and Homologous Recombination Within Bacterial Populations

AU - Lin, Mingzhi

AU - Kussell, Edo

PY - 2017/2/1

Y1 - 2017/2/1

N2 - Inferring the rate of homologous recombination within a bacterial population remains a key challenge in quantifying the basic parameters of bacterial evolution. Due to the high sequence similarity within a clonal population, and unique aspects of bacterial DNA transfer processes, detecting recombination events based on phylogenetic reconstruction is often difficult, and estimating recombination rates using coalescent model-based methods is computationally expensive, and often infeasible for large sequencing data sets. Here, we present an efficient solution by introducing a set of mutational correlation functions computed using pairwise sequence comparison, which characterize various facets of bacterial recombination. We provide analytical expressions for these functions, which precisely recapitulate simulation results of neutral and adapting populations under different coalescent models. We used these to fit correlation functions measured at synonymous substitutions using whole-genome data on Escherichia coli and Streptococcus pneumoniae populations. We calculated and corrected for the effect of sample selection bias, i.e., the uneven sampling of individuals from natural microbial populations that exists in most datasets. Our method is fast and efficient, and does not employ phylogenetic inference or other computationally intensive numerics. By simply fitting analytical forms to measurements from sequence data, we show that recombination rates can be inferred, and the relative ages of different samples can be estimated. Our approach, which is based on population genetic modeling, is broadly applicable to a wide variety of data, and its computational efficiency makes it particularly attractive for use in the analysis of large sequencing datasets.

AB - Inferring the rate of homologous recombination within a bacterial population remains a key challenge in quantifying the basic parameters of bacterial evolution. Due to the high sequence similarity within a clonal population, and unique aspects of bacterial DNA transfer processes, detecting recombination events based on phylogenetic reconstruction is often difficult, and estimating recombination rates using coalescent model-based methods is computationally expensive, and often infeasible for large sequencing data sets. Here, we present an efficient solution by introducing a set of mutational correlation functions computed using pairwise sequence comparison, which characterize various facets of bacterial recombination. We provide analytical expressions for these functions, which precisely recapitulate simulation results of neutral and adapting populations under different coalescent models. We used these to fit correlation functions measured at synonymous substitutions using whole-genome data on Escherichia coli and Streptococcus pneumoniae populations. We calculated and corrected for the effect of sample selection bias, i.e., the uneven sampling of individuals from natural microbial populations that exists in most datasets. Our method is fast and efficient, and does not employ phylogenetic inference or other computationally intensive numerics. By simply fitting analytical forms to measurements from sequence data, we show that recombination rates can be inferred, and the relative ages of different samples can be estimated. Our approach, which is based on population genetic modeling, is broadly applicable to a wide variety of data, and its computational efficiency makes it particularly attractive for use in the analysis of large sequencing datasets.

KW - adapting populations

KW - bacteria

KW - Bolthausen–Sznitman coalescent

KW - homologous recombination

KW - population diversity

KW - sample ages

KW - sample selection bias

UR - http://www.scopus.com/inward/record.url?scp=85021847801&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85021847801&partnerID=8YFLogxK

U2 - 10.1534/genetics.116.189621

DO - 10.1534/genetics.116.189621

M3 - Article

VL - 205

SP - 891

EP - 917

JO - Genetics

JF - Genetics

SN - 0016-6731

IS - 2

ER -