TY - JOUR
T1 - Phylogenetic placement of metagenomic reads using the minimum evolution principle
AU - Filipski, Alan
AU - Tamura, Koichiro
AU - Billing-Ross, Paul
AU - Murillo, Oscar
AU - Kumar, Sudhir
N1 - Funding Information:
Publication charges for this article have been funded from research grants from National Institutes of Health (NIH; HG002096-12) and HiCi-1434-117-1 from KAU. This article has been published as part of BMC Genomics Volume 16 Supplement 1, 2015: Selected articles from the 2nd International Genomic Medical Conference (IGMC 2013): Genomics. The full contents of the supplement are available online at http://www.biomedcentral.com/ bmcgenomics/supplements/16/S1 1Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA. 2Department of Biological Sciences, Tokyo Metropolitan University, Tokyo, Japan. 3Department of Molecular Biology and Genetics, College of Liberal Arts and Sciences, Cornell University, Ithaca, NY, 14853-5905, USA. 4Department of Biology, Temple University, Philadelphia, PA 19122, USA. 5Center for Genomic Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia.
Funding Information:
This work was supported by funding from National Institutes of Health (NIH; HG002096-12 to SK and HG006039-02 to A.F.). O.M. was also supported by a training program (NIH, R25GM099650). We thank Dr. Rosa Krajmalnic-Brown for helpful conversations on metagenomic requirements.
Publisher Copyright:
© 2014 Filipski et al.
PY - 2015/1/15
Y1 - 2015/1/15
N2 - Background: A central problem of computational metagenomics is determining the correct placement into an existing phylogenetic tree of individual reads (nucleotide sequences of varying lengths, ranging from hundreds to thousands of bases) obtained using next-generation sequencing of DNA samples from a mixture of known and unknown species. Correct placement allows us to easily identify or classify the sequences in the sample as to taxonomic position or function. Results: Here we propose a novel method (PhyClass), based on the Minimum Evolution (ME) phylogenetic inference criterion, for determining the appropriate phylogenetic position of each read. Without using heuristics, the new approach efficiently finds the optimal placement of the unknown read in a reference phylogenetic tree given a sequence alignment for the taxa in the tree. In short, the total resulting branch length for the tree is computed for every possible placement of the unknown read and the placement that gives the smallest value for this total is the best (optimal) choice. By taking advantage of computational efficiencies and mathematical formulations, we are able to find the true optimal ME placement for each read in the phylogenetic tree. Using computer simulations, we assessed the accuracy of the new approach for different read lengths over a variety of data sets and phylogenetic trees. We found the accuracy of the new method to be good and comparable to existing Maximum Likelihood (ML) approaches. Conclusions: In particular, we found that the consensus assignments based on ME and ML approaches are more correct than either method individually. This is true even when the statistical support for read assignments was low, which is inevitable given that individual reads are often short and come from only one gene.
AB - Background: A central problem of computational metagenomics is determining the correct placement into an existing phylogenetic tree of individual reads (nucleotide sequences of varying lengths, ranging from hundreds to thousands of bases) obtained using next-generation sequencing of DNA samples from a mixture of known and unknown species. Correct placement allows us to easily identify or classify the sequences in the sample as to taxonomic position or function. Results: Here we propose a novel method (PhyClass), based on the Minimum Evolution (ME) phylogenetic inference criterion, for determining the appropriate phylogenetic position of each read. Without using heuristics, the new approach efficiently finds the optimal placement of the unknown read in a reference phylogenetic tree given a sequence alignment for the taxa in the tree. In short, the total resulting branch length for the tree is computed for every possible placement of the unknown read and the placement that gives the smallest value for this total is the best (optimal) choice. By taking advantage of computational efficiencies and mathematical formulations, we are able to find the true optimal ME placement for each read in the phylogenetic tree. Using computer simulations, we assessed the accuracy of the new approach for different read lengths over a variety of data sets and phylogenetic trees. We found the accuracy of the new method to be good and comparable to existing Maximum Likelihood (ML) approaches. Conclusions: In particular, we found that the consensus assignments based on ME and ML approaches are more correct than either method individually. This is true even when the statistical support for read assignments was low, which is inevitable given that individual reads are often short and come from only one gene.
UR - http://www.scopus.com/inward/record.url?scp=84924333525&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84924333525&partnerID=8YFLogxK
U2 - 10.1186/1471-2164-16-S1-S13
DO - 10.1186/1471-2164-16-S1-S13
M3 - Article
AN - SCOPUS:84924333525
SN - 1471-2164
VL - 16
JO - BMC Genomics
JF - BMC Genomics
IS - 1
M1 - S13
ER -