Multiple sequence alignment accuracy and phylogenetic inference

T. Heath Ogden, Michael S. Rosenberg

Research output: Contribution to journalArticle

137 Citations (Scopus)

Abstract

Phylogenies are often thought to be more dependent upon the specifics of the sequence alignment rather than on the method of reconstruction. Simulation of sequences containing insertion and deletion events was performed in order to determine the role that alignment accuracy plays during phylogenetic inference. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (ultrametric equal branch length, ultrametric random branch length, nonultrametric random branch length). Comparisons between hypothesized alignments and true alignments enabled determination of two measures of alignment accuracy, that of the total data set and that of individual branches. In general, our results indicate that as alignment error increases, topological accuracy decreases. This trend was much more pronounced for data sets derived from more pectinate topologies. In contrast, for balanced, ultrametric, equal branch length tree shapes, alignment inaccuracy had little average effect on tree reconstruction. These conclusions are based on average trends of many analyses under different conditions, and any one specific analysis, independent of the alignment accuracy, may recover very accurate or inaccurate topologies. Maximum likelihood and Bayesian, in general, outperformed neighbor joining and maximum parsimony in terms of tree reconstruction accuracy. Results also indicated that as the length of the branch and of the neighboring branches increase, alignment accuracy decreases, and the length of the neighboring branches is the major factor in topological accuracy. Thus, multiple-sequence alignment can be an important factor in downstream effects on topological reconstruction.

Original languageEnglish (US)
Pages (from-to)314-328
Number of pages15
JournalSystematic Biology
Volume55
Issue number2
DOIs
StatePublished - Apr 2006

Fingerprint

Sequence Alignment
sequence alignment
phylogenetics
phylogeny
Sequence Deletion
Insertional Mutagenesis
Phylogeny
topology
Datasets
alignment

Keywords

  • Bayesian
  • Maximum likelihood
  • Maximum parsimony
  • Multiple sequence alignment
  • Neighbor joining
  • Phylogenetics
  • Simulation
  • Tree reconstruction

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics
  • Genetics

Cite this

Multiple sequence alignment accuracy and phylogenetic inference. / Ogden, T. Heath; Rosenberg, Michael S.

In: Systematic Biology, Vol. 55, No. 2, 04.2006, p. 314-328.

Research output: Contribution to journalArticle

Ogden, T. Heath ; Rosenberg, Michael S. / Multiple sequence alignment accuracy and phylogenetic inference. In: Systematic Biology. 2006 ; Vol. 55, No. 2. pp. 314-328.
@article{d2908f3da23947e7aef768e478a71536,
title = "Multiple sequence alignment accuracy and phylogenetic inference",
abstract = "Phylogenies are often thought to be more dependent upon the specifics of the sequence alignment rather than on the method of reconstruction. Simulation of sequences containing insertion and deletion events was performed in order to determine the role that alignment accuracy plays during phylogenetic inference. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (ultrametric equal branch length, ultrametric random branch length, nonultrametric random branch length). Comparisons between hypothesized alignments and true alignments enabled determination of two measures of alignment accuracy, that of the total data set and that of individual branches. In general, our results indicate that as alignment error increases, topological accuracy decreases. This trend was much more pronounced for data sets derived from more pectinate topologies. In contrast, for balanced, ultrametric, equal branch length tree shapes, alignment inaccuracy had little average effect on tree reconstruction. These conclusions are based on average trends of many analyses under different conditions, and any one specific analysis, independent of the alignment accuracy, may recover very accurate or inaccurate topologies. Maximum likelihood and Bayesian, in general, outperformed neighbor joining and maximum parsimony in terms of tree reconstruction accuracy. Results also indicated that as the length of the branch and of the neighboring branches increase, alignment accuracy decreases, and the length of the neighboring branches is the major factor in topological accuracy. Thus, multiple-sequence alignment can be an important factor in downstream effects on topological reconstruction.",
keywords = "Bayesian, Maximum likelihood, Maximum parsimony, Multiple sequence alignment, Neighbor joining, Phylogenetics, Simulation, Tree reconstruction",
author = "Ogden, {T. Heath} and Rosenberg, {Michael S.}",
year = "2006",
month = "4",
doi = "10.1080/10635150500541730",
language = "English (US)",
volume = "55",
pages = "314--328",
journal = "Systematic Biology",
issn = "1063-5157",
publisher = "Oxford University Press",
number = "2",

}

TY - JOUR

T1 - Multiple sequence alignment accuracy and phylogenetic inference

AU - Ogden, T. Heath

AU - Rosenberg, Michael S.

PY - 2006/4

Y1 - 2006/4

N2 - Phylogenies are often thought to be more dependent upon the specifics of the sequence alignment rather than on the method of reconstruction. Simulation of sequences containing insertion and deletion events was performed in order to determine the role that alignment accuracy plays during phylogenetic inference. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (ultrametric equal branch length, ultrametric random branch length, nonultrametric random branch length). Comparisons between hypothesized alignments and true alignments enabled determination of two measures of alignment accuracy, that of the total data set and that of individual branches. In general, our results indicate that as alignment error increases, topological accuracy decreases. This trend was much more pronounced for data sets derived from more pectinate topologies. In contrast, for balanced, ultrametric, equal branch length tree shapes, alignment inaccuracy had little average effect on tree reconstruction. These conclusions are based on average trends of many analyses under different conditions, and any one specific analysis, independent of the alignment accuracy, may recover very accurate or inaccurate topologies. Maximum likelihood and Bayesian, in general, outperformed neighbor joining and maximum parsimony in terms of tree reconstruction accuracy. Results also indicated that as the length of the branch and of the neighboring branches increase, alignment accuracy decreases, and the length of the neighboring branches is the major factor in topological accuracy. Thus, multiple-sequence alignment can be an important factor in downstream effects on topological reconstruction.

AB - Phylogenies are often thought to be more dependent upon the specifics of the sequence alignment rather than on the method of reconstruction. Simulation of sequences containing insertion and deletion events was performed in order to determine the role that alignment accuracy plays during phylogenetic inference. Data sets were simulated for pectinate, balanced, and random tree shapes under different conditions (ultrametric equal branch length, ultrametric random branch length, nonultrametric random branch length). Comparisons between hypothesized alignments and true alignments enabled determination of two measures of alignment accuracy, that of the total data set and that of individual branches. In general, our results indicate that as alignment error increases, topological accuracy decreases. This trend was much more pronounced for data sets derived from more pectinate topologies. In contrast, for balanced, ultrametric, equal branch length tree shapes, alignment inaccuracy had little average effect on tree reconstruction. These conclusions are based on average trends of many analyses under different conditions, and any one specific analysis, independent of the alignment accuracy, may recover very accurate or inaccurate topologies. Maximum likelihood and Bayesian, in general, outperformed neighbor joining and maximum parsimony in terms of tree reconstruction accuracy. Results also indicated that as the length of the branch and of the neighboring branches increase, alignment accuracy decreases, and the length of the neighboring branches is the major factor in topological accuracy. Thus, multiple-sequence alignment can be an important factor in downstream effects on topological reconstruction.

KW - Bayesian

KW - Maximum likelihood

KW - Maximum parsimony

KW - Multiple sequence alignment

KW - Neighbor joining

KW - Phylogenetics

KW - Simulation

KW - Tree reconstruction

UR - http://www.scopus.com/inward/record.url?scp=33744993430&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33744993430&partnerID=8YFLogxK

U2 - 10.1080/10635150500541730

DO - 10.1080/10635150500541730

M3 - Article

VL - 55

SP - 314

EP - 328

JO - Systematic Biology

JF - Systematic Biology

SN - 1063-5157

IS - 2

ER -