TY - JOUR
T1 - Combining NMR and LC/MS using backward variable elimination
T2 - Metabolomics analysis of colorectal cancer, polyps, and healthy controls
AU - Deng, Lingli
AU - Gu, Haiwei
AU - Zhu, Jiangjiang
AU - Nagana Gowda, G. A.
AU - Djukovic, Danijel
AU - Chiorean, E. Gabriela
AU - Raftery, Daniel
N1 - Funding Information:
This work was supported in part by the National Institutes of Health (Grants 2R01 GM085291 and 2P30 CA015704), AMRMC Grant W81XWH-10-0540, the China Scholarship Council, the Chinese National Instrumentation Program (2011YQ170067), the PCSIRT program (IRT13054), the National Natural Science Foundation of China (21365001), the Science and Technology Planning Project at the Ministry of Science and Technology of Jiangxi Province, China (No. 20152ACH80010), the ITHS Rising Stars Program (UL1TR000423), and the University of Washington. The authors also thank Dr. Lin Lin (Department of Statistics, The Pennsylvania State University, University Park, PA) for her help with data analysis and the reviewers for their helpful comments.
Publisher Copyright:
© 2016 American Chemical Society.
PY - 2016/8/16
Y1 - 2016/8/16
N2 - Both nuclear magnetic resonance (NMR) spectroscopy and mass spectrometry (MS) play important roles in metabolomics. The complementary features of NMR and MS make their combination very attractive; however, currently the vast majority of metabolomics studies use either NMR or MS separately, and variable selection that combines NMR and MS for biomarker identification and statistical modeling is still not well developed. In this study focused on methodology, we developed a backward variable elimination partial least-squares discriminant analysis algorithm embedded with Monte Carlo cross validation (MCCV-BVE-PLSDA), to combine NMR and targeted liquid chromatography (LC)/MS data. Using the metabolomics analysis of serum for the detection of colorectal cancer (CRC) and polyps as an example, we demonstrate that variable selection is vitally important in combining NMR and MS data. The combined approach was better than using NMR or LC/MS data alone in providing significantly improved predictive accuracy in all the pairwise comparisons among CRC, polyps, and healthy controls. Using this approach, we selected a subset of metabolites responsible for the improved separation for each pairwise comparison, and we achieved a comprehensive profile of altered metabolite levels, including those in glycolysis, the TCA cycle, amino acid metabolism, and other pathways that were related to CRC and polyps. MCCV-BVE-PLSDA is straightforward, easy to implement, and highly useful for studying the contribution of each individual variable to multivariate statistical models. On the basis of these results, we recommend using an appropriate variable selection step, such as MCCV-BVE-PLSDA, when analyzing data from multiple analytical platforms to obtain improved statistical performance and a more accurate biological interpretation, especially for biomarker discovery. Importantly, the approach described here is relatively universal and can be easily expanded for combination with other analytical technologies.
AB - Both nuclear magnetic resonance (NMR) spectroscopy and mass spectrometry (MS) play important roles in metabolomics. The complementary features of NMR and MS make their combination very attractive; however, currently the vast majority of metabolomics studies use either NMR or MS separately, and variable selection that combines NMR and MS for biomarker identification and statistical modeling is still not well developed. In this study focused on methodology, we developed a backward variable elimination partial least-squares discriminant analysis algorithm embedded with Monte Carlo cross validation (MCCV-BVE-PLSDA), to combine NMR and targeted liquid chromatography (LC)/MS data. Using the metabolomics analysis of serum for the detection of colorectal cancer (CRC) and polyps as an example, we demonstrate that variable selection is vitally important in combining NMR and MS data. The combined approach was better than using NMR or LC/MS data alone in providing significantly improved predictive accuracy in all the pairwise comparisons among CRC, polyps, and healthy controls. Using this approach, we selected a subset of metabolites responsible for the improved separation for each pairwise comparison, and we achieved a comprehensive profile of altered metabolite levels, including those in glycolysis, the TCA cycle, amino acid metabolism, and other pathways that were related to CRC and polyps. MCCV-BVE-PLSDA is straightforward, easy to implement, and highly useful for studying the contribution of each individual variable to multivariate statistical models. On the basis of these results, we recommend using an appropriate variable selection step, such as MCCV-BVE-PLSDA, when analyzing data from multiple analytical platforms to obtain improved statistical performance and a more accurate biological interpretation, especially for biomarker discovery. Importantly, the approach described here is relatively universal and can be easily expanded for combination with other analytical technologies.
UR - http://www.scopus.com/inward/record.url?scp=84983242613&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84983242613&partnerID=8YFLogxK
U2 - 10.1021/acs.analchem.6b00885
DO - 10.1021/acs.analchem.6b00885
M3 - Article
C2 - 27437783
AN - SCOPUS:84983242613
SN - 0003-2700
VL - 88
SP - 7975
EP - 7983
JO - Analytical Chemistry
JF - Analytical Chemistry
IS - 16
ER -