Improved bootstrap confidence limits in large-scale phylogenies, with an example from Neo-Astragalus (Leguminosae)

Michael J. Sanderson, Martin Wojciechowski

Research output: Contribution to journalArticle

106 Citations (Scopus)

Abstract

Phylogenetic analyses of large data sets pose special challenges, including the apparent tendency for the bootstrap support for a clade to decline with increased taxon sampling of that clade. We document this decline in data sets with increasing numbers of taxa in Astragalus, the most species-rich angiosperm genus. Support for one subclade, Neo-Astragalus, declined monotonically with increased sampling of taxa inside Neo-Astragalus, irrespective of whether parsimony or neighbor-joining methods were used or of which particular heuristic search algorithm was used (although more stringent algorithms tended to yield higher support). Three possible explanations for this decline were examined, including (1) mistaken assignment of the most recent common ancestor of the taxon sample (and its bootstrap support) with the most recent common ancestor of the clade from which it was sampled; (2) computational limitations of heuristic search strategies; and (3) statistical bias in bootstrap proportions, especially that from random homoplasy distributed among taxa. The best explanation appears to be (3), although computational shortcomings (2) may explain some of the problem. The bootstrap proportion, as currently used in phylogenetic analysis, does not accurately capture the classical notion of confidence assessments on the null hypothesis of nonmonophyly, especially in large data sets. More accurate assessments of confidence as type 1 error levels (relying on iterated bootstrap methods) remove most of the monotonic decline in confidence with increasing numbers of taxa.

Original languageEnglish (US)
Pages (from-to)671-685
Number of pages15
JournalSystematic Biology
Volume49
Issue number4
StatePublished - Dec 2000
Externally publishedYes

Fingerprint

Astragalus
Phylogeny
Fabaceae
phylogeny
ancestry
common ancestry
heuristics
Angiosperms
phylogenetics
sampling
bootstrapping
Angiospermae
angiosperm
methodology
Datasets
Heuristics

Keywords

  • Bootstrap
  • Phylogeny reconstruction
  • Species richness
  • Taxon sampling

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics

Cite this

Improved bootstrap confidence limits in large-scale phylogenies, with an example from Neo-Astragalus (Leguminosae). / Sanderson, Michael J.; Wojciechowski, Martin.

In: Systematic Biology, Vol. 49, No. 4, 12.2000, p. 671-685.

Research output: Contribution to journalArticle

@article{c696cc66319746dd98e574cc7a226439,
title = "Improved bootstrap confidence limits in large-scale phylogenies, with an example from Neo-Astragalus (Leguminosae)",
abstract = "Phylogenetic analyses of large data sets pose special challenges, including the apparent tendency for the bootstrap support for a clade to decline with increased taxon sampling of that clade. We document this decline in data sets with increasing numbers of taxa in Astragalus, the most species-rich angiosperm genus. Support for one subclade, Neo-Astragalus, declined monotonically with increased sampling of taxa inside Neo-Astragalus, irrespective of whether parsimony or neighbor-joining methods were used or of which particular heuristic search algorithm was used (although more stringent algorithms tended to yield higher support). Three possible explanations for this decline were examined, including (1) mistaken assignment of the most recent common ancestor of the taxon sample (and its bootstrap support) with the most recent common ancestor of the clade from which it was sampled; (2) computational limitations of heuristic search strategies; and (3) statistical bias in bootstrap proportions, especially that from random homoplasy distributed among taxa. The best explanation appears to be (3), although computational shortcomings (2) may explain some of the problem. The bootstrap proportion, as currently used in phylogenetic analysis, does not accurately capture the classical notion of confidence assessments on the null hypothesis of nonmonophyly, especially in large data sets. More accurate assessments of confidence as type 1 error levels (relying on iterated bootstrap methods) remove most of the monotonic decline in confidence with increasing numbers of taxa.",
keywords = "Bootstrap, Phylogeny reconstruction, Species richness, Taxon sampling",
author = "Sanderson, {Michael J.} and Martin Wojciechowski",
year = "2000",
month = "12",
language = "English (US)",
volume = "49",
pages = "671--685",
journal = "Systematic Biology",
issn = "1063-5157",
publisher = "Oxford University Press",
number = "4",

}

TY - JOUR

T1 - Improved bootstrap confidence limits in large-scale phylogenies, with an example from Neo-Astragalus (Leguminosae)

AU - Sanderson, Michael J.

AU - Wojciechowski, Martin

PY - 2000/12

Y1 - 2000/12

N2 - Phylogenetic analyses of large data sets pose special challenges, including the apparent tendency for the bootstrap support for a clade to decline with increased taxon sampling of that clade. We document this decline in data sets with increasing numbers of taxa in Astragalus, the most species-rich angiosperm genus. Support for one subclade, Neo-Astragalus, declined monotonically with increased sampling of taxa inside Neo-Astragalus, irrespective of whether parsimony or neighbor-joining methods were used or of which particular heuristic search algorithm was used (although more stringent algorithms tended to yield higher support). Three possible explanations for this decline were examined, including (1) mistaken assignment of the most recent common ancestor of the taxon sample (and its bootstrap support) with the most recent common ancestor of the clade from which it was sampled; (2) computational limitations of heuristic search strategies; and (3) statistical bias in bootstrap proportions, especially that from random homoplasy distributed among taxa. The best explanation appears to be (3), although computational shortcomings (2) may explain some of the problem. The bootstrap proportion, as currently used in phylogenetic analysis, does not accurately capture the classical notion of confidence assessments on the null hypothesis of nonmonophyly, especially in large data sets. More accurate assessments of confidence as type 1 error levels (relying on iterated bootstrap methods) remove most of the monotonic decline in confidence with increasing numbers of taxa.

AB - Phylogenetic analyses of large data sets pose special challenges, including the apparent tendency for the bootstrap support for a clade to decline with increased taxon sampling of that clade. We document this decline in data sets with increasing numbers of taxa in Astragalus, the most species-rich angiosperm genus. Support for one subclade, Neo-Astragalus, declined monotonically with increased sampling of taxa inside Neo-Astragalus, irrespective of whether parsimony or neighbor-joining methods were used or of which particular heuristic search algorithm was used (although more stringent algorithms tended to yield higher support). Three possible explanations for this decline were examined, including (1) mistaken assignment of the most recent common ancestor of the taxon sample (and its bootstrap support) with the most recent common ancestor of the clade from which it was sampled; (2) computational limitations of heuristic search strategies; and (3) statistical bias in bootstrap proportions, especially that from random homoplasy distributed among taxa. The best explanation appears to be (3), although computational shortcomings (2) may explain some of the problem. The bootstrap proportion, as currently used in phylogenetic analysis, does not accurately capture the classical notion of confidence assessments on the null hypothesis of nonmonophyly, especially in large data sets. More accurate assessments of confidence as type 1 error levels (relying on iterated bootstrap methods) remove most of the monotonic decline in confidence with increasing numbers of taxa.

KW - Bootstrap

KW - Phylogeny reconstruction

KW - Species richness

KW - Taxon sampling

UR - http://www.scopus.com/inward/record.url?scp=0034351392&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0034351392&partnerID=8YFLogxK

M3 - Article

C2 - 12116433

AN - SCOPUS:0034351392

VL - 49

SP - 671

EP - 685

JO - Systematic Biology

JF - Systematic Biology

SN - 1063-5157

IS - 4

ER -