Clustering and training set selection methods for improving the accuracy of quantitative laser induced breakdown spectroscopy

Ryan B. Anderson, James Bell, Roger C. Wiens, Richard V. Morris, Samuel M. Clegg

Research output: Contribution to journalArticle

21 Scopus citations


We investigated five clustering and training set selection methods to improve the accuracy of quantitative chemical analysis of geologic samples by laser induced breakdown spectroscopy (LIBS) using partial least squares (PLS) regression. The LIBS spectra were previously acquired for 195 rock slabs and 31 pressed powder geostandards under 7 Torr CO 2 at a stand-off distance of 7 m at 17 mJ per pulse to simulate the operational conditions of the ChemCam LIBS instrument on the Mars Science Laboratory Curiosity rover. The clustering and training set selection methods, which do not require prior knowledge of the chemical composition of the test-set samples, are based on grouping similar spectra and selecting appropriate training spectra for the partial least squares (PLS2) model. These methods were: (1) hierarchical clustering of the full set of training spectra and selection of a subset for use in training; (2) k-means clustering of all spectra and generation of PLS2 models based on the training samples within each cluster; (3) iterative use of PLS2 to predict sample composition and k-means clustering of the predicted compositions to subdivide the groups of spectra; (4) soft independent modeling of class analogy (SIMCA) classification of spectra, and generation of PLS2 models based on the training samples within each class; (5) use of Bayesian information criteria (BIC) to determine an optimal number of clusters and generation of PLS2 models based on the training samples within each cluster. The iterative method and the k-means method using 5 clusters showed the best performance, improving the absolute quadrature root mean squared error (RMSE) by ∼ 3 wt.%. The statistical significance of these improvements was ∼ 85%. Our results show that although clustering methods can modestly improve results, a large and diverse training set is the most reliable way to improve the accuracy of quantitative LIBS. In particular, additional sulfate standards and specifically fabricated analog samples with Mars-like compositions may improve the accuracy of ChemCam measurements on Mars. Refinement of the iterative method, modifications of the basic k-means clustering algorithm, and classification based on specifically selected S, C and Si emission lines may also prove beneficial and merit further study.

Original languageEnglish (US)
Pages (from-to)24-32
Number of pages9
JournalSpectrochimica Acta - Part B Atomic Spectroscopy
StatePublished - Apr 1 2012



  • ChemCam
  • Laser-induced breakdown spectroscopy
  • Mars
  • Multivariate analysis

ASJC Scopus subject areas

  • Analytical Chemistry
  • Atomic and Molecular Physics, and Optics
  • Instrumentation
  • Spectroscopy

Cite this