Statistical analysis and modeling of mass spectrometry-based metabolomics data

Bowei Xi, Haiwei Gu, Hamid Baniasadi, Daniel Raftery

Research output: Contribution to journalArticle

44 Citations (Scopus)

Abstract

Multivariate statistical techniques are used extensively in metabolomics studies, ranging from biomarker selection to model building and validation. Two model independent variable selection techniques, principal component analysis and two sample t-tests are discussed in this chapter, as well as classification and regression models and model related variable selection techniques, including partial least squares, logistic regression, support vector machine, and random forest. Model evaluation and validation methods, such as leave-one-out cross-validation, Monte Carlo cross-validation, and receiver operating characteristic analysis, are introduced with an emphasis to avoid over-fitting the data. The advantages and the limitations of the statistical techniques are also discussed in this chapter.

Original languageEnglish (US)
Pages (from-to)333-353
Number of pages21
JournalMethods in Molecular Biology
Volume1198
DOIs
StatePublished - Jan 1 2014
Externally publishedYes

Fingerprint

Metabolomics
Principal Component Analysis
Least-Squares Analysis
ROC Curve
Mass Spectrometry
Biomarkers
Logistic Models
Support Vector Machine
Forests

Keywords

  • Classification
  • Mass spectrometry
  • Metabolomics
  • Multivariate statistics

ASJC Scopus subject areas

  • Molecular Biology
  • Genetics

Cite this

Statistical analysis and modeling of mass spectrometry-based metabolomics data. / Xi, Bowei; Gu, Haiwei; Baniasadi, Hamid; Raftery, Daniel.

In: Methods in Molecular Biology, Vol. 1198, 01.01.2014, p. 333-353.

Research output: Contribution to journalArticle

Xi, Bowei ; Gu, Haiwei ; Baniasadi, Hamid ; Raftery, Daniel. / Statistical analysis and modeling of mass spectrometry-based metabolomics data. In: Methods in Molecular Biology. 2014 ; Vol. 1198. pp. 333-353.
@article{9c653d3c643f42c1882cac9a7c6dce85,
title = "Statistical analysis and modeling of mass spectrometry-based metabolomics data",
abstract = "Multivariate statistical techniques are used extensively in metabolomics studies, ranging from biomarker selection to model building and validation. Two model independent variable selection techniques, principal component analysis and two sample t-tests are discussed in this chapter, as well as classification and regression models and model related variable selection techniques, including partial least squares, logistic regression, support vector machine, and random forest. Model evaluation and validation methods, such as leave-one-out cross-validation, Monte Carlo cross-validation, and receiver operating characteristic analysis, are introduced with an emphasis to avoid over-fitting the data. The advantages and the limitations of the statistical techniques are also discussed in this chapter.",
keywords = "Classification, Mass spectrometry, Metabolomics, Multivariate statistics",
author = "Bowei Xi and Haiwei Gu and Hamid Baniasadi and Daniel Raftery",
year = "2014",
month = "1",
day = "1",
doi = "10.1007/978-1-4939-1258-2_22",
language = "English (US)",
volume = "1198",
pages = "333--353",
journal = "Methods in molecular biology (Clifton, N.J.)",
issn = "1064-3745",
publisher = "Humana Press",

}

TY - JOUR

T1 - Statistical analysis and modeling of mass spectrometry-based metabolomics data

AU - Xi, Bowei

AU - Gu, Haiwei

AU - Baniasadi, Hamid

AU - Raftery, Daniel

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Multivariate statistical techniques are used extensively in metabolomics studies, ranging from biomarker selection to model building and validation. Two model independent variable selection techniques, principal component analysis and two sample t-tests are discussed in this chapter, as well as classification and regression models and model related variable selection techniques, including partial least squares, logistic regression, support vector machine, and random forest. Model evaluation and validation methods, such as leave-one-out cross-validation, Monte Carlo cross-validation, and receiver operating characteristic analysis, are introduced with an emphasis to avoid over-fitting the data. The advantages and the limitations of the statistical techniques are also discussed in this chapter.

AB - Multivariate statistical techniques are used extensively in metabolomics studies, ranging from biomarker selection to model building and validation. Two model independent variable selection techniques, principal component analysis and two sample t-tests are discussed in this chapter, as well as classification and regression models and model related variable selection techniques, including partial least squares, logistic regression, support vector machine, and random forest. Model evaluation and validation methods, such as leave-one-out cross-validation, Monte Carlo cross-validation, and receiver operating characteristic analysis, are introduced with an emphasis to avoid over-fitting the data. The advantages and the limitations of the statistical techniques are also discussed in this chapter.

KW - Classification

KW - Mass spectrometry

KW - Metabolomics

KW - Multivariate statistics

UR - http://www.scopus.com/inward/record.url?scp=84916228533&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84916228533&partnerID=8YFLogxK

U2 - 10.1007/978-1-4939-1258-2_22

DO - 10.1007/978-1-4939-1258-2_22

M3 - Article

VL - 1198

SP - 333

EP - 353

JO - Methods in molecular biology (Clifton, N.J.)

JF - Methods in molecular biology (Clifton, N.J.)

SN - 1064-3745

ER -