Phylogenetic placement of metagenomic reads using the minimum evolution principle

Alan Filipski, Koichiro Tamura, Paul Billing-Ross, Oscar Murillo, Sudhir Kumar

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Background: A central problem of computational metagenomics is determining the correct placement into an existing phylogenetic tree of individual reads (nucleotide sequences of varying lengths, ranging from hundreds to thousands of bases) obtained using next-generation sequencing of DNA samples from a mixture of known and unknown species. Correct placement allows us to easily identify or classify the sequences in the sample as to taxonomic position or function. Results: Here we propose a novel method (PhyClass), based on the Minimum Evolution (ME) phylogenetic inference criterion, for determining the appropriate phylogenetic position of each read. Without using heuristics, the new approach efficiently finds the optimal placement of the unknown read in a reference phylogenetic tree given a sequence alignment for the taxa in the tree. In short, the total resulting branch length for the tree is computed for every possible placement of the unknown read and the placement that gives the smallest value for this total is the best (optimal) choice. By taking advantage of computational efficiencies and mathematical formulations, we are able to find the true optimal ME placement for each read in the phylogenetic tree. Using computer simulations, we assessed the accuracy of the new approach for different read lengths over a variety of data sets and phylogenetic trees. We found the accuracy of the new method to be good and comparable to existing Maximum Likelihood (ML) approaches. Conclusions: In particular, we found that the consensus assignments based on ME and ML approaches are more correct than either method individually. This is true even when the statistical support for read assignments was low, which is inevitable given that individual reads are often short and come from only one gene.

Original languageEnglish (US)
Article numberS13
JournalBMC Genomics
Volume16
Issue number1
DOIs
StatePublished - Jan 15 2015
Externally publishedYes

Fingerprint

Metagenomics
Sequence Alignment
DNA Sequence Analysis
Computer Simulation
Consensus
Efficiency
Genes

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Cite this

Filipski, A., Tamura, K., Billing-Ross, P., Murillo, O., & Kumar, S. (2015). Phylogenetic placement of metagenomic reads using the minimum evolution principle. BMC Genomics, 16(1), [S13]. https://doi.org/10.1186/1471-2164-16-S1-S13

Phylogenetic placement of metagenomic reads using the minimum evolution principle. / Filipski, Alan; Tamura, Koichiro; Billing-Ross, Paul; Murillo, Oscar; Kumar, Sudhir.

In: BMC Genomics, Vol. 16, No. 1, S13, 15.01.2015.

Research output: Contribution to journalArticle

Filipski, A, Tamura, K, Billing-Ross, P, Murillo, O & Kumar, S 2015, 'Phylogenetic placement of metagenomic reads using the minimum evolution principle', BMC Genomics, vol. 16, no. 1, S13. https://doi.org/10.1186/1471-2164-16-S1-S13
Filipski, Alan ; Tamura, Koichiro ; Billing-Ross, Paul ; Murillo, Oscar ; Kumar, Sudhir. / Phylogenetic placement of metagenomic reads using the minimum evolution principle. In: BMC Genomics. 2015 ; Vol. 16, No. 1.
@article{ead240715c9c4b31a3ceb9c6cf60c0e0,
title = "Phylogenetic placement of metagenomic reads using the minimum evolution principle",
abstract = "Background: A central problem of computational metagenomics is determining the correct placement into an existing phylogenetic tree of individual reads (nucleotide sequences of varying lengths, ranging from hundreds to thousands of bases) obtained using next-generation sequencing of DNA samples from a mixture of known and unknown species. Correct placement allows us to easily identify or classify the sequences in the sample as to taxonomic position or function. Results: Here we propose a novel method (PhyClass), based on the Minimum Evolution (ME) phylogenetic inference criterion, for determining the appropriate phylogenetic position of each read. Without using heuristics, the new approach efficiently finds the optimal placement of the unknown read in a reference phylogenetic tree given a sequence alignment for the taxa in the tree. In short, the total resulting branch length for the tree is computed for every possible placement of the unknown read and the placement that gives the smallest value for this total is the best (optimal) choice. By taking advantage of computational efficiencies and mathematical formulations, we are able to find the true optimal ME placement for each read in the phylogenetic tree. Using computer simulations, we assessed the accuracy of the new approach for different read lengths over a variety of data sets and phylogenetic trees. We found the accuracy of the new method to be good and comparable to existing Maximum Likelihood (ML) approaches. Conclusions: In particular, we found that the consensus assignments based on ME and ML approaches are more correct than either method individually. This is true even when the statistical support for read assignments was low, which is inevitable given that individual reads are often short and come from only one gene.",
author = "Alan Filipski and Koichiro Tamura and Paul Billing-Ross and Oscar Murillo and Sudhir Kumar",
year = "2015",
month = "1",
day = "15",
doi = "10.1186/1471-2164-16-S1-S13",
language = "English (US)",
volume = "16",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Phylogenetic placement of metagenomic reads using the minimum evolution principle

AU - Filipski, Alan

AU - Tamura, Koichiro

AU - Billing-Ross, Paul

AU - Murillo, Oscar

AU - Kumar, Sudhir

PY - 2015/1/15

Y1 - 2015/1/15

N2 - Background: A central problem of computational metagenomics is determining the correct placement into an existing phylogenetic tree of individual reads (nucleotide sequences of varying lengths, ranging from hundreds to thousands of bases) obtained using next-generation sequencing of DNA samples from a mixture of known and unknown species. Correct placement allows us to easily identify or classify the sequences in the sample as to taxonomic position or function. Results: Here we propose a novel method (PhyClass), based on the Minimum Evolution (ME) phylogenetic inference criterion, for determining the appropriate phylogenetic position of each read. Without using heuristics, the new approach efficiently finds the optimal placement of the unknown read in a reference phylogenetic tree given a sequence alignment for the taxa in the tree. In short, the total resulting branch length for the tree is computed for every possible placement of the unknown read and the placement that gives the smallest value for this total is the best (optimal) choice. By taking advantage of computational efficiencies and mathematical formulations, we are able to find the true optimal ME placement for each read in the phylogenetic tree. Using computer simulations, we assessed the accuracy of the new approach for different read lengths over a variety of data sets and phylogenetic trees. We found the accuracy of the new method to be good and comparable to existing Maximum Likelihood (ML) approaches. Conclusions: In particular, we found that the consensus assignments based on ME and ML approaches are more correct than either method individually. This is true even when the statistical support for read assignments was low, which is inevitable given that individual reads are often short and come from only one gene.

AB - Background: A central problem of computational metagenomics is determining the correct placement into an existing phylogenetic tree of individual reads (nucleotide sequences of varying lengths, ranging from hundreds to thousands of bases) obtained using next-generation sequencing of DNA samples from a mixture of known and unknown species. Correct placement allows us to easily identify or classify the sequences in the sample as to taxonomic position or function. Results: Here we propose a novel method (PhyClass), based on the Minimum Evolution (ME) phylogenetic inference criterion, for determining the appropriate phylogenetic position of each read. Without using heuristics, the new approach efficiently finds the optimal placement of the unknown read in a reference phylogenetic tree given a sequence alignment for the taxa in the tree. In short, the total resulting branch length for the tree is computed for every possible placement of the unknown read and the placement that gives the smallest value for this total is the best (optimal) choice. By taking advantage of computational efficiencies and mathematical formulations, we are able to find the true optimal ME placement for each read in the phylogenetic tree. Using computer simulations, we assessed the accuracy of the new approach for different read lengths over a variety of data sets and phylogenetic trees. We found the accuracy of the new method to be good and comparable to existing Maximum Likelihood (ML) approaches. Conclusions: In particular, we found that the consensus assignments based on ME and ML approaches are more correct than either method individually. This is true even when the statistical support for read assignments was low, which is inevitable given that individual reads are often short and come from only one gene.

UR - http://www.scopus.com/inward/record.url?scp=84924333525&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84924333525&partnerID=8YFLogxK

U2 - 10.1186/1471-2164-16-S1-S13

DO - 10.1186/1471-2164-16-S1-S13

M3 - Article

VL - 16

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

IS - 1

M1 - S13

ER -