Inferring species phylogenies from multiple genes: Concatenated sequence tree versus consensus gene tree

Sudhindra R. Gadagkar; Michael S. Rosenberg; Sudhir Kumar

doi:10.1002/jez.b.21026

Inferring species phylogenies from multiple genes: Concatenated sequence tree versus consensus gene tree

Sudhindra R. Gadagkar, Michael S. Rosenberg, Sudhir Kumar

Research output: Contribution to journal › Article › peer-review

326 Scopus citations

Abstract

Phylogenetic trees from multiple genes can be obtained in two fundamentally different ways. In one, gene sequences are concatenated into a super-gene alignment, which is then analyzed to generate the species tree. In the other, phylogenies are inferred separately from each gene, and a consensus of these gene phylogenies is used to represent the species tree. Here, we have compared these two approaches by means of computer simulation, using 448 parameter sets, including evolutionary rate, sequence length, base composition, and transition/transversion rate bias. In these simulations, we emphasized a worst-case scenario analysis in which 100 replicate datasets for each evolutionary parameter set (gene) were generated, and the replicate dataset that produced a tree topology showing the largest number of phylogenetic errors was selected to represent that parameter set. Both randomly selected and worst-case replicates were utilized to compare the consensus and concatenation approaches primarily using the neighbor-joining (NJ) method. We find that the concatenation approach yields more accurate trees, even when the sequences concatenated have evolved with very different substitution patterns and no attempts are made to accommodate these differences while inferring phylogenies. These results appear to hold true for parsimony and likelihood methods as well. The concatenation approach shows >95% accuracy with only 10 genes. However, this gain in accuracy is sometimes accompanied by reinforcement of certain systematic biases, resulting in spuriously high bootstrap support for incorrect partitions, whether we employ site, gene, or a combined bootstrap resampling approach. Therefore, it will be prudent to report the number of individual genes supporting an inferred clade in the concatenated sequence tree, in addition to the bootstrap support.

Original language	English (US)
Pages (from-to)	64-74
Number of pages	11
Journal	Journal of Experimental Zoology Part B: Molecular and Developmental Evolution
Volume	304
Issue number	1
DOIs	https://doi.org/10.1002/jez.b.21026
State	Published - Jan 15 2005

ASJC Scopus subject areas

Ecology, Evolution, Behavior and Systematics
Molecular Medicine
Animal Science and Zoology
Genetics
Developmental Biology

Access to Document

10.1002/jez.b.21026

Cite this

@article{5f58d901605d4378a0be5ed799714edd,

title = "Inferring species phylogenies from multiple genes: Concatenated sequence tree versus consensus gene tree",

abstract = "Phylogenetic trees from multiple genes can be obtained in two fundamentally different ways. In one, gene sequences are concatenated into a super-gene alignment, which is then analyzed to generate the species tree. In the other, phylogenies are inferred separately from each gene, and a consensus of these gene phylogenies is used to represent the species tree. Here, we have compared these two approaches by means of computer simulation, using 448 parameter sets, including evolutionary rate, sequence length, base composition, and transition/transversion rate bias. In these simulations, we emphasized a worst-case scenario analysis in which 100 replicate datasets for each evolutionary parameter set (gene) were generated, and the replicate dataset that produced a tree topology showing the largest number of phylogenetic errors was selected to represent that parameter set. Both randomly selected and worst-case replicates were utilized to compare the consensus and concatenation approaches primarily using the neighbor-joining (NJ) method. We find that the concatenation approach yields more accurate trees, even when the sequences concatenated have evolved with very different substitution patterns and no attempts are made to accommodate these differences while inferring phylogenies. These results appear to hold true for parsimony and likelihood methods as well. The concatenation approach shows >95% accuracy with only 10 genes. However, this gain in accuracy is sometimes accompanied by reinforcement of certain systematic biases, resulting in spuriously high bootstrap support for incorrect partitions, whether we employ site, gene, or a combined bootstrap resampling approach. Therefore, it will be prudent to report the number of individual genes supporting an inferred clade in the concatenated sequence tree, in addition to the bootstrap support.",

author = "Gadagkar, {Sudhindra R.} and Rosenberg, {Michael S.} and Sudhir Kumar",

year = "2005",

month = jan,

day = "15",

doi = "10.1002/jez.b.21026",

language = "English (US)",

volume = "304",

pages = "64--74",

journal = "Journal of Experimental Zoology Part B: Molecular and Developmental Evolution",

issn = "1552-5007",

publisher = "John Wiley and Sons Inc.",

number = "1",

}

TY - JOUR

T1 - Inferring species phylogenies from multiple genes

T2 - Concatenated sequence tree versus consensus gene tree

AU - Gadagkar, Sudhindra R.

AU - Rosenberg, Michael S.

AU - Kumar, Sudhir

PY - 2005/1/15

Y1 - 2005/1/15

N2 - Phylogenetic trees from multiple genes can be obtained in two fundamentally different ways. In one, gene sequences are concatenated into a super-gene alignment, which is then analyzed to generate the species tree. In the other, phylogenies are inferred separately from each gene, and a consensus of these gene phylogenies is used to represent the species tree. Here, we have compared these two approaches by means of computer simulation, using 448 parameter sets, including evolutionary rate, sequence length, base composition, and transition/transversion rate bias. In these simulations, we emphasized a worst-case scenario analysis in which 100 replicate datasets for each evolutionary parameter set (gene) were generated, and the replicate dataset that produced a tree topology showing the largest number of phylogenetic errors was selected to represent that parameter set. Both randomly selected and worst-case replicates were utilized to compare the consensus and concatenation approaches primarily using the neighbor-joining (NJ) method. We find that the concatenation approach yields more accurate trees, even when the sequences concatenated have evolved with very different substitution patterns and no attempts are made to accommodate these differences while inferring phylogenies. These results appear to hold true for parsimony and likelihood methods as well. The concatenation approach shows >95% accuracy with only 10 genes. However, this gain in accuracy is sometimes accompanied by reinforcement of certain systematic biases, resulting in spuriously high bootstrap support for incorrect partitions, whether we employ site, gene, or a combined bootstrap resampling approach. Therefore, it will be prudent to report the number of individual genes supporting an inferred clade in the concatenated sequence tree, in addition to the bootstrap support.

AB - Phylogenetic trees from multiple genes can be obtained in two fundamentally different ways. In one, gene sequences are concatenated into a super-gene alignment, which is then analyzed to generate the species tree. In the other, phylogenies are inferred separately from each gene, and a consensus of these gene phylogenies is used to represent the species tree. Here, we have compared these two approaches by means of computer simulation, using 448 parameter sets, including evolutionary rate, sequence length, base composition, and transition/transversion rate bias. In these simulations, we emphasized a worst-case scenario analysis in which 100 replicate datasets for each evolutionary parameter set (gene) were generated, and the replicate dataset that produced a tree topology showing the largest number of phylogenetic errors was selected to represent that parameter set. Both randomly selected and worst-case replicates were utilized to compare the consensus and concatenation approaches primarily using the neighbor-joining (NJ) method. We find that the concatenation approach yields more accurate trees, even when the sequences concatenated have evolved with very different substitution patterns and no attempts are made to accommodate these differences while inferring phylogenies. These results appear to hold true for parsimony and likelihood methods as well. The concatenation approach shows >95% accuracy with only 10 genes. However, this gain in accuracy is sometimes accompanied by reinforcement of certain systematic biases, resulting in spuriously high bootstrap support for incorrect partitions, whether we employ site, gene, or a combined bootstrap resampling approach. Therefore, it will be prudent to report the number of individual genes supporting an inferred clade in the concatenated sequence tree, in addition to the bootstrap support.

UR - http://www.scopus.com/inward/record.url?scp=14844313275&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=14844313275&partnerID=8YFLogxK

U2 - 10.1002/jez.b.21026

DO - 10.1002/jez.b.21026

M3 - Article

C2 - 15593277

AN - SCOPUS:14844313275

SN - 1552-5007

VL - 304

SP - 64

EP - 74

JO - Journal of Experimental Zoology Part B: Molecular and Developmental Evolution

JF - Journal of Experimental Zoology Part B: Molecular and Developmental Evolution

IS - 1

ER -

Inferring species phylogenies from multiple genes: Concatenated sequence tree versus consensus gene tree

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this