Problems and solutions for estimating indel rates and length distributions

Research output: Contribution to journalArticle

33 Citations (Scopus)

Abstract

Insertions and deletions (indels) are fundamental but understudied components of molecular evolution. Here we present an expectation-maximization algorithm built on a pair hidden Markov model that is able to properly handle indels in neutrally evolving DNA sequences. From a data set of orthologous introns, we estimate relative rates and length distributions of indels among primates and rodents. This technique has the advantage of potentially handling large genomic data sets. We find that a zeta power-law model of indel lengths provides a much better fit than the traditional geometric model and that indel processes are conserved between our taxa. The estimated relative rates are about 12-16 indels per 100 substitutions, and the estimated power-law magnitudes are about 1.6-1.7. More significantly, we find that using the traditional geometric/affine model of indel lengths introduces artifacts into evolutionary analysis, casting doubt on studies of the evolution and diversity of indel formation using traditional models and invalidating measures of species divergence that include indel lengths.

Original languageEnglish (US)
Pages (from-to)473-480
Number of pages8
JournalMolecular Biology and Evolution
Volume26
Issue number2
DOIs
StatePublished - Feb 2009
Externally publishedYes

Fingerprint

Molecular Evolution
Artifacts
Introns
Primates
Rodentia
power law
primate
rodent
artifact
introns
genomics
substitution
rodents
divergence
Datasets
distribution
rate
nucleotide sequences
DNA
methodology

Keywords

  • Comparative genomics
  • Conservation
  • Estimation
  • Indel
  • Power law

ASJC Scopus subject areas

  • Genetics
  • Molecular Biology
  • Ecology, Evolution, Behavior and Systematics

Cite this

Problems and solutions for estimating indel rates and length distributions. / Cartwright, Reed.

In: Molecular Biology and Evolution, Vol. 26, No. 2, 02.2009, p. 473-480.

Research output: Contribution to journalArticle

@article{608345e2101a4ffcabfe20b4f57d0dab,
title = "Problems and solutions for estimating indel rates and length distributions",
abstract = "Insertions and deletions (indels) are fundamental but understudied components of molecular evolution. Here we present an expectation-maximization algorithm built on a pair hidden Markov model that is able to properly handle indels in neutrally evolving DNA sequences. From a data set of orthologous introns, we estimate relative rates and length distributions of indels among primates and rodents. This technique has the advantage of potentially handling large genomic data sets. We find that a zeta power-law model of indel lengths provides a much better fit than the traditional geometric model and that indel processes are conserved between our taxa. The estimated relative rates are about 12-16 indels per 100 substitutions, and the estimated power-law magnitudes are about 1.6-1.7. More significantly, we find that using the traditional geometric/affine model of indel lengths introduces artifacts into evolutionary analysis, casting doubt on studies of the evolution and diversity of indel formation using traditional models and invalidating measures of species divergence that include indel lengths.",
keywords = "Comparative genomics, Conservation, Estimation, Indel, Power law",
author = "Reed Cartwright",
year = "2009",
month = "2",
doi = "10.1093/molbev/msn275",
language = "English (US)",
volume = "26",
pages = "473--480",
journal = "Molecular Biology and Evolution",
issn = "0737-4038",
publisher = "Oxford University Press",
number = "2",

}

TY - JOUR

T1 - Problems and solutions for estimating indel rates and length distributions

AU - Cartwright, Reed

PY - 2009/2

Y1 - 2009/2

N2 - Insertions and deletions (indels) are fundamental but understudied components of molecular evolution. Here we present an expectation-maximization algorithm built on a pair hidden Markov model that is able to properly handle indels in neutrally evolving DNA sequences. From a data set of orthologous introns, we estimate relative rates and length distributions of indels among primates and rodents. This technique has the advantage of potentially handling large genomic data sets. We find that a zeta power-law model of indel lengths provides a much better fit than the traditional geometric model and that indel processes are conserved between our taxa. The estimated relative rates are about 12-16 indels per 100 substitutions, and the estimated power-law magnitudes are about 1.6-1.7. More significantly, we find that using the traditional geometric/affine model of indel lengths introduces artifacts into evolutionary analysis, casting doubt on studies of the evolution and diversity of indel formation using traditional models and invalidating measures of species divergence that include indel lengths.

AB - Insertions and deletions (indels) are fundamental but understudied components of molecular evolution. Here we present an expectation-maximization algorithm built on a pair hidden Markov model that is able to properly handle indels in neutrally evolving DNA sequences. From a data set of orthologous introns, we estimate relative rates and length distributions of indels among primates and rodents. This technique has the advantage of potentially handling large genomic data sets. We find that a zeta power-law model of indel lengths provides a much better fit than the traditional geometric model and that indel processes are conserved between our taxa. The estimated relative rates are about 12-16 indels per 100 substitutions, and the estimated power-law magnitudes are about 1.6-1.7. More significantly, we find that using the traditional geometric/affine model of indel lengths introduces artifacts into evolutionary analysis, casting doubt on studies of the evolution and diversity of indel formation using traditional models and invalidating measures of species divergence that include indel lengths.

KW - Comparative genomics

KW - Conservation

KW - Estimation

KW - Indel

KW - Power law

UR - http://www.scopus.com/inward/record.url?scp=58449127271&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=58449127271&partnerID=8YFLogxK

U2 - 10.1093/molbev/msn275

DO - 10.1093/molbev/msn275

M3 - Article

VL - 26

SP - 473

EP - 480

JO - Molecular Biology and Evolution

JF - Molecular Biology and Evolution

SN - 0737-4038

IS - 2

ER -