Inferring rates and length-distributions of indels using approximate Bayesian computation

Eli Levy Karin, Dafna Shkedy, Haim Ashkenazy, Reed Cartwright, Tal Pupko

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Themost common evolutionary events at themolecular level are single-base substitutions, aswell as insertions and deletions (indels) of short DNA segments. A large body of research has been devoted to develop probabilistic substitution models and to infer their parameters using likelihood and Bayesian approaches. In contrast, relatively little has been done to model indel dynamics, probably due to the difficulty in writing explicit likelihood functions. Here, we contribute to the effort of modeling indel dynamics by presenting SpartaABC, an approximate Bayesian computation (ABC) approach to infer indel parameters from sequence data (either aligned or unaligned). SpartaABC circumvents the need to use an explicit likelihood function by extracting summary statistics from simulated sequences. First, summary statistics are extracted from the input sequence data. Second, SpartaABC samples indel parameters from a prior distribution and uses them to simulate sequences. Third, it computes summary statistics from the simulated sets of sequences. By computing a distance between the summary statistics extracted from the input and each simulation, SpartaABC can provide an approximation to the posterior distribution of indel parameters as well as point estimates.Westudy the performance of our methodology and showthat it provides accurate estimates of indel parameters in simulations. We next demonstrate the utility of SpartaABC by studying the impact of alignment errors on the inference of positive selection. A C++program implementing SpartaABC is freely available in http://spartaabc.tau.ac.il.

Original languageEnglish (US)
Pages (from-to)1280-1294
Number of pages15
JournalGenome Biology and Evolution
Volume9
Issue number5
DOIs
StatePublished - 2017

Fingerprint

statistics
Likelihood Functions
Bayes Theorem
substitution
Statistical Models
dynamic models
simulation
distribution
rate
parameter
DNA
Research
methodology
modeling
sampling

Keywords

  • Alignments
  • Approximate Bayesian computation
  • Indels
  • Simulations

ASJC Scopus subject areas

  • Medicine(all)
  • Ecology, Evolution, Behavior and Systematics
  • Genetics

Cite this

Inferring rates and length-distributions of indels using approximate Bayesian computation. / Karin, Eli Levy; Shkedy, Dafna; Ashkenazy, Haim; Cartwright, Reed; Pupko, Tal.

In: Genome Biology and Evolution, Vol. 9, No. 5, 2017, p. 1280-1294.

Research output: Contribution to journalArticle

Karin, Eli Levy ; Shkedy, Dafna ; Ashkenazy, Haim ; Cartwright, Reed ; Pupko, Tal. / Inferring rates and length-distributions of indels using approximate Bayesian computation. In: Genome Biology and Evolution. 2017 ; Vol. 9, No. 5. pp. 1280-1294.
@article{0f1454bcd7c54015aedae0c40d6dc04b,
title = "Inferring rates and length-distributions of indels using approximate Bayesian computation",
abstract = "Themost common evolutionary events at themolecular level are single-base substitutions, aswell as insertions and deletions (indels) of short DNA segments. A large body of research has been devoted to develop probabilistic substitution models and to infer their parameters using likelihood and Bayesian approaches. In contrast, relatively little has been done to model indel dynamics, probably due to the difficulty in writing explicit likelihood functions. Here, we contribute to the effort of modeling indel dynamics by presenting SpartaABC, an approximate Bayesian computation (ABC) approach to infer indel parameters from sequence data (either aligned or unaligned). SpartaABC circumvents the need to use an explicit likelihood function by extracting summary statistics from simulated sequences. First, summary statistics are extracted from the input sequence data. Second, SpartaABC samples indel parameters from a prior distribution and uses them to simulate sequences. Third, it computes summary statistics from the simulated sets of sequences. By computing a distance between the summary statistics extracted from the input and each simulation, SpartaABC can provide an approximation to the posterior distribution of indel parameters as well as point estimates.Westudy the performance of our methodology and showthat it provides accurate estimates of indel parameters in simulations. We next demonstrate the utility of SpartaABC by studying the impact of alignment errors on the inference of positive selection. A C++program implementing SpartaABC is freely available in http://spartaabc.tau.ac.il.",
keywords = "Alignments, Approximate Bayesian computation, Indels, Simulations",
author = "Karin, {Eli Levy} and Dafna Shkedy and Haim Ashkenazy and Reed Cartwright and Tal Pupko",
year = "2017",
doi = "10.1093/gbe/evx084",
language = "English (US)",
volume = "9",
pages = "1280--1294",
journal = "Genome Biology and Evolution",
issn = "1759-6653",
publisher = "Oxford University Press",
number = "5",

}

TY - JOUR

T1 - Inferring rates and length-distributions of indels using approximate Bayesian computation

AU - Karin, Eli Levy

AU - Shkedy, Dafna

AU - Ashkenazy, Haim

AU - Cartwright, Reed

AU - Pupko, Tal

PY - 2017

Y1 - 2017

N2 - Themost common evolutionary events at themolecular level are single-base substitutions, aswell as insertions and deletions (indels) of short DNA segments. A large body of research has been devoted to develop probabilistic substitution models and to infer their parameters using likelihood and Bayesian approaches. In contrast, relatively little has been done to model indel dynamics, probably due to the difficulty in writing explicit likelihood functions. Here, we contribute to the effort of modeling indel dynamics by presenting SpartaABC, an approximate Bayesian computation (ABC) approach to infer indel parameters from sequence data (either aligned or unaligned). SpartaABC circumvents the need to use an explicit likelihood function by extracting summary statistics from simulated sequences. First, summary statistics are extracted from the input sequence data. Second, SpartaABC samples indel parameters from a prior distribution and uses them to simulate sequences. Third, it computes summary statistics from the simulated sets of sequences. By computing a distance between the summary statistics extracted from the input and each simulation, SpartaABC can provide an approximation to the posterior distribution of indel parameters as well as point estimates.Westudy the performance of our methodology and showthat it provides accurate estimates of indel parameters in simulations. We next demonstrate the utility of SpartaABC by studying the impact of alignment errors on the inference of positive selection. A C++program implementing SpartaABC is freely available in http://spartaabc.tau.ac.il.

AB - Themost common evolutionary events at themolecular level are single-base substitutions, aswell as insertions and deletions (indels) of short DNA segments. A large body of research has been devoted to develop probabilistic substitution models and to infer their parameters using likelihood and Bayesian approaches. In contrast, relatively little has been done to model indel dynamics, probably due to the difficulty in writing explicit likelihood functions. Here, we contribute to the effort of modeling indel dynamics by presenting SpartaABC, an approximate Bayesian computation (ABC) approach to infer indel parameters from sequence data (either aligned or unaligned). SpartaABC circumvents the need to use an explicit likelihood function by extracting summary statistics from simulated sequences. First, summary statistics are extracted from the input sequence data. Second, SpartaABC samples indel parameters from a prior distribution and uses them to simulate sequences. Third, it computes summary statistics from the simulated sets of sequences. By computing a distance between the summary statistics extracted from the input and each simulation, SpartaABC can provide an approximation to the posterior distribution of indel parameters as well as point estimates.Westudy the performance of our methodology and showthat it provides accurate estimates of indel parameters in simulations. We next demonstrate the utility of SpartaABC by studying the impact of alignment errors on the inference of positive selection. A C++program implementing SpartaABC is freely available in http://spartaabc.tau.ac.il.

KW - Alignments

KW - Approximate Bayesian computation

KW - Indels

KW - Simulations

UR - http://www.scopus.com/inward/record.url?scp=85026671229&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85026671229&partnerID=8YFLogxK

U2 - 10.1093/gbe/evx084

DO - 10.1093/gbe/evx084

M3 - Article

C2 - 28453624

AN - SCOPUS:85026671229

VL - 9

SP - 1280

EP - 1294

JO - Genome Biology and Evolution

JF - Genome Biology and Evolution

SN - 1759-6653

IS - 5

ER -