Evolutionary distance estimation under heterogeneous substitution pattern among lineages

Koichiro Tamura, Sudhir Kumar

Research output: Contribution to journalArticle

160 Citations (Scopus)

Abstract

Most of the sophisticated methods to estimate evolutionary divergence between DNA sequences assume that the two sequences have evolved with the same pattern of nucleotide substitution after their divergence from their most recent common ancestor (homogeneity assumption). If this assumption is violated, the evolutionary distance estimated will be biased, which may result in biased estimates of divergence times and substitution rates, and may lead to erroneous branching patterns in the inferred phylogenies. Here we present a simple modification for existing distance estimation methods to relax the assumption of the substitution pattern homogeneity among lineages when analyzing DNA and protein sequences. Results from computer simulations and empirical data analyses for human and mouse genes are presented to demonstrate that the proposed modification reduces the estimation bias considerably and that the modified method performs much better than the LogDet methods, which do not require the homogeneity assumption in estimating the number of substitutions per site. We also discuss the relationship of the substitution and mutation rate estimates when the substitution pattern is not the same in the lineages leading to the two sequences compared.

Original languageEnglish (US)
Pages (from-to)1727-1736
Number of pages10
JournalMolecular Biology and Evolution
Volume19
Issue number10
StatePublished - Oct 1 2002

Fingerprint

substitution
Substitution reactions
homogeneity
divergence
nucleotide sequences
divergent evolution
Mutation Rate
Phylogeny
methodology
computer simulation
Computer Simulation
branching
DNA
ancestry
amino acid sequences
Nucleotides
DNA sequences
nucleotides
common ancestry
estimation method

Keywords

  • Base composition
  • Computer simulation
  • LogDet
  • Mutation rate
  • Substitution rate

ASJC Scopus subject areas

  • Genetics
  • Biochemistry
  • Genetics(clinical)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Ecology, Evolution, Behavior and Systematics
  • Agricultural and Biological Sciences (miscellaneous)
  • Molecular Biology

Cite this

Evolutionary distance estimation under heterogeneous substitution pattern among lineages. / Tamura, Koichiro; Kumar, Sudhir.

In: Molecular Biology and Evolution, Vol. 19, No. 10, 01.10.2002, p. 1727-1736.

Research output: Contribution to journalArticle

Tamura, Koichiro ; Kumar, Sudhir. / Evolutionary distance estimation under heterogeneous substitution pattern among lineages. In: Molecular Biology and Evolution. 2002 ; Vol. 19, No. 10. pp. 1727-1736.
@article{3c7ad6507a1347368b02126ef9bba5ca,
title = "Evolutionary distance estimation under heterogeneous substitution pattern among lineages",
abstract = "Most of the sophisticated methods to estimate evolutionary divergence between DNA sequences assume that the two sequences have evolved with the same pattern of nucleotide substitution after their divergence from their most recent common ancestor (homogeneity assumption). If this assumption is violated, the evolutionary distance estimated will be biased, which may result in biased estimates of divergence times and substitution rates, and may lead to erroneous branching patterns in the inferred phylogenies. Here we present a simple modification for existing distance estimation methods to relax the assumption of the substitution pattern homogeneity among lineages when analyzing DNA and protein sequences. Results from computer simulations and empirical data analyses for human and mouse genes are presented to demonstrate that the proposed modification reduces the estimation bias considerably and that the modified method performs much better than the LogDet methods, which do not require the homogeneity assumption in estimating the number of substitutions per site. We also discuss the relationship of the substitution and mutation rate estimates when the substitution pattern is not the same in the lineages leading to the two sequences compared.",
keywords = "Base composition, Computer simulation, LogDet, Mutation rate, Substitution rate",
author = "Koichiro Tamura and Sudhir Kumar",
year = "2002",
month = "10",
day = "1",
language = "English (US)",
volume = "19",
pages = "1727--1736",
journal = "Molecular Biology and Evolution",
issn = "0737-4038",
publisher = "Oxford University Press",
number = "10",

}

TY - JOUR

T1 - Evolutionary distance estimation under heterogeneous substitution pattern among lineages

AU - Tamura, Koichiro

AU - Kumar, Sudhir

PY - 2002/10/1

Y1 - 2002/10/1

N2 - Most of the sophisticated methods to estimate evolutionary divergence between DNA sequences assume that the two sequences have evolved with the same pattern of nucleotide substitution after their divergence from their most recent common ancestor (homogeneity assumption). If this assumption is violated, the evolutionary distance estimated will be biased, which may result in biased estimates of divergence times and substitution rates, and may lead to erroneous branching patterns in the inferred phylogenies. Here we present a simple modification for existing distance estimation methods to relax the assumption of the substitution pattern homogeneity among lineages when analyzing DNA and protein sequences. Results from computer simulations and empirical data analyses for human and mouse genes are presented to demonstrate that the proposed modification reduces the estimation bias considerably and that the modified method performs much better than the LogDet methods, which do not require the homogeneity assumption in estimating the number of substitutions per site. We also discuss the relationship of the substitution and mutation rate estimates when the substitution pattern is not the same in the lineages leading to the two sequences compared.

AB - Most of the sophisticated methods to estimate evolutionary divergence between DNA sequences assume that the two sequences have evolved with the same pattern of nucleotide substitution after their divergence from their most recent common ancestor (homogeneity assumption). If this assumption is violated, the evolutionary distance estimated will be biased, which may result in biased estimates of divergence times and substitution rates, and may lead to erroneous branching patterns in the inferred phylogenies. Here we present a simple modification for existing distance estimation methods to relax the assumption of the substitution pattern homogeneity among lineages when analyzing DNA and protein sequences. Results from computer simulations and empirical data analyses for human and mouse genes are presented to demonstrate that the proposed modification reduces the estimation bias considerably and that the modified method performs much better than the LogDet methods, which do not require the homogeneity assumption in estimating the number of substitutions per site. We also discuss the relationship of the substitution and mutation rate estimates when the substitution pattern is not the same in the lineages leading to the two sequences compared.

KW - Base composition

KW - Computer simulation

KW - LogDet

KW - Mutation rate

KW - Substitution rate

UR - http://www.scopus.com/inward/record.url?scp=0036790028&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036790028&partnerID=8YFLogxK

M3 - Article

C2 - 12270899

AN - SCOPUS:0036790028

VL - 19

SP - 1727

EP - 1736

JO - Molecular Biology and Evolution

JF - Molecular Biology and Evolution

SN - 0737-4038

IS - 10

ER -