TY - JOUR
T1 - Phylogeny-Aware Analysis of Metagenome Community Ecology Based on Matched Reference Genomes while Bypassing Taxonomy
AU - Zhu, Qiyun
AU - Huang, Shi
AU - Gonzalez, Antonio
AU - McGrath, Imran
AU - McDonald, Daniel
AU - Haiminen, Niina
AU - Armstrong, George
AU - Vazquez-Baeza, Yoshiki
AU - Yu, Julian
AU - Kuczynski, Justin
AU - Sepich-Poore, Gregory D.
AU - Swafford, Austin D.
AU - Das, Promi
AU - Shaffer, Justin P.
AU - Lejzerowicz, Franck
AU - Belda-Ferre, Pedro
AU - Havulinna, Aki S.
AU - Meric, Guillaume
AU - Niiranen, Teemu
AU - Lahti, Leo
AU - Salomaa, Veikko
AU - Kim, Ho Cheol
AU - Jain, Mohit
AU - Inouye, Michael
AU - Gilbert, Jack A.
AU - Knight, Rob
N1 - Funding Information:
This work was supported in part by an Arizona State University start-up grant (to Q.Z.), Sloan Foundation G-2017-9838, IBM Research AI through the AI Horizons Network-AI for Healthy Living A1770534, DARPA JUMP/CRISP, NIH P30DK120515, DP1AT010885, U19AG063744, U24CA248454, Emerald Foundation Distinguished Investigator Award, Crohn’s and Colitis Foundation 675191, NSF RAPID 2038509, IBM Research AI through the AI Horizons Network and the UC San Diego Center for Microbiome Innovation (to S.H., I.M., Y.V.-B., and R.K.). G.D.S.-P. is supported by a fellowship from the National Institutes of Health (F30 CA243480). T.N. was funded by the Emil Aaltonen Foundation, the Finnish Medical Foundation, the Finnish Foundation for Cardiovascular Disease, and the Academy of Finland (grant 321351). L.L. was funded by the Academy of Finland (grant 295741). V.S. was supported by the Finnish Foundation for Cardiovascular Research. J.P.S. was supported by NIH/NIGMS IRACDA K12 GM068524. This work used the Comet supercomputer at the San Diego Supercomputer Center through allocation BIO150043 through the Extreme Science and Engineering Discovery Environment (XSEDE).
Publisher Copyright:
© 2022 American Society for Microbiology. All rights reserved.
PY - 2022/4
Y1 - 2022/4
N2 - We introduce the operational genomic unit (OGU) method, a metagenome analysis strategy that directly exploits sequence alignment hits to individual reference genomes as the minimum unit for assessing the diversity of microbial communities and their relevance to environmental factors. This approach is independent of taxonomic classification, granting the possibility of maximal resolution of community composition, and organizes features into an accurate hierarchy using a phylogenomic tree. The outputs are suitable for contemporary analytical protocols for community ecology, differential abundance, and supervised learning while supporting phylogenetic methods, such as UniFrac and phylofactorization, that are seldom applied to shotgun metagenomics despite being prevalent in 16S rRNA gene amplicon studies. As demonstrated in two real-world case studies, the OGU method produces biologically meaningful patterns from microbiome data sets. Such patterns further remain detectable at very low metagenomic sequencing depths. Compared with taxonomic unit-based analyses implemented in currently adopted metagenomics tools, and the analysis of 16S rRNA gene amplicon sequence variants, this method shows superiority in informing biologically relevant insights, including stronger correlation with body environment and host sex on the Human Microbiome Project data set and more accurate prediction of human age by the gut microbiomes of Finnish individuals included in the FINRISK 2002 cohort. We provide Woltka, a bioinformatics tool to implement this method, with full integration with the QIIME 2 package and the Qiita web platform, to facilitate adoption of the OGU method in future metagenomics studies.
AB - We introduce the operational genomic unit (OGU) method, a metagenome analysis strategy that directly exploits sequence alignment hits to individual reference genomes as the minimum unit for assessing the diversity of microbial communities and their relevance to environmental factors. This approach is independent of taxonomic classification, granting the possibility of maximal resolution of community composition, and organizes features into an accurate hierarchy using a phylogenomic tree. The outputs are suitable for contemporary analytical protocols for community ecology, differential abundance, and supervised learning while supporting phylogenetic methods, such as UniFrac and phylofactorization, that are seldom applied to shotgun metagenomics despite being prevalent in 16S rRNA gene amplicon studies. As demonstrated in two real-world case studies, the OGU method produces biologically meaningful patterns from microbiome data sets. Such patterns further remain detectable at very low metagenomic sequencing depths. Compared with taxonomic unit-based analyses implemented in currently adopted metagenomics tools, and the analysis of 16S rRNA gene amplicon sequence variants, this method shows superiority in informing biologically relevant insights, including stronger correlation with body environment and host sex on the Human Microbiome Project data set and more accurate prediction of human age by the gut microbiomes of Finnish individuals included in the FINRISK 2002 cohort. We provide Woltka, a bioinformatics tool to implement this method, with full integration with the QIIME 2 package and the Qiita web platform, to facilitate adoption of the OGU method in future metagenomics studies.
KW - UniFrac
KW - metagenomics
KW - operational genomic unit
KW - reference phylogeny
KW - supervised learning
KW - taxonomy independent
UR - http://www.scopus.com/inward/record.url?scp=85129109833&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85129109833&partnerID=8YFLogxK
U2 - 10.1128/msystems.00167-22
DO - 10.1128/msystems.00167-22
M3 - Article
AN - SCOPUS:85129109833
SN - 2379-5077
VL - 7
JO - mSystems
JF - mSystems
IS - 2
ER -