Abstract

An automated approach to extract interpretable features of univariate or multivariate profiles (functional data) is proposed. A landmark alignment algorithm is modified and the alignment is combined with piecewise linear approximations. Least absolute shrinkage and selection operator (lasso) regression is used for selecting the most important intercepts and slopes and yields an alternative to partial least squares to model a response associated with the profiles. Latent variables can be difficult to interpret but our extracted features simply correspond to slopes and intercepts of particular parts of the profiles. Also, features that relate to the degree of warping between a given profile and a reference can be extracted as predictors. Selection criteria for the number of knots and common knot locations between profiles are developed. We apply our proposed method to batch fermentation data where the profiles consist of on-line measurements of process variables and the corresponding yield of the process. The extracted features have good interpretability (with large dimensional reduction) and in combination with the lasso have prediction accuracy which is comparable with that of partial least squares applied to the original profiles. Also our proposed feature extraction method is applied to publicly available data where near infrared spectra define the profiles and the prediction accuracy of our feature lasso method is comparable with those of more complicated alternatives.

Original languageEnglish (US)
Pages (from-to)327-344
Number of pages18
JournalJournal of the Royal Statistical Society. Series C: Applied Statistics
Volume61
Issue number2
DOIs
StatePublished - Mar 2012

Fingerprint

Fermentation
Feature Extraction
Batch
Shrinkage
Partial Least Squares
Intercept
Knot
Slope
Alignment
Operator
Piecewise Linear Approximation
Functional Data
Dimensional Reduction
Profile
Feature extraction
Warping
Prediction
Interpretability
Alternatives
Latent Variables

Keywords

  • Alignment
  • Functional partial least squares and principal components regression
  • Knot selection
  • Lasso
  • Near infrared spectra
  • Segmentation

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Cite this

@article{15676e5f1f1f4b68bbcdcbc4f256235d,
title = "Automated feature extraction from profiles with application to a batch fermentation process",
abstract = "An automated approach to extract interpretable features of univariate or multivariate profiles (functional data) is proposed. A landmark alignment algorithm is modified and the alignment is combined with piecewise linear approximations. Least absolute shrinkage and selection operator (lasso) regression is used for selecting the most important intercepts and slopes and yields an alternative to partial least squares to model a response associated with the profiles. Latent variables can be difficult to interpret but our extracted features simply correspond to slopes and intercepts of particular parts of the profiles. Also, features that relate to the degree of warping between a given profile and a reference can be extracted as predictors. Selection criteria for the number of knots and common knot locations between profiles are developed. We apply our proposed method to batch fermentation data where the profiles consist of on-line measurements of process variables and the corresponding yield of the process. The extracted features have good interpretability (with large dimensional reduction) and in combination with the lasso have prediction accuracy which is comparable with that of partial least squares applied to the original profiles. Also our proposed feature extraction method is applied to publicly available data where near infrared spectra define the profiles and the prediction accuracy of our feature lasso method is comparable with those of more complicated alternatives.",
keywords = "Alignment, Functional partial least squares and principal components regression, Knot selection, Lasso, Near infrared spectra, Segmentation",
author = "Andersen, {Stina W.} and George Runger",
year = "2012",
month = "3",
doi = "10.1111/j.1467-9876.2011.01032.x",
language = "English (US)",
volume = "61",
pages = "327--344",
journal = "Journal of the Royal Statistical Society. Series C: Applied Statistics",
issn = "0035-9254",
publisher = "Wiley-Blackwell",
number = "2",

}

TY - JOUR

T1 - Automated feature extraction from profiles with application to a batch fermentation process

AU - Andersen, Stina W.

AU - Runger, George

PY - 2012/3

Y1 - 2012/3

N2 - An automated approach to extract interpretable features of univariate or multivariate profiles (functional data) is proposed. A landmark alignment algorithm is modified and the alignment is combined with piecewise linear approximations. Least absolute shrinkage and selection operator (lasso) regression is used for selecting the most important intercepts and slopes and yields an alternative to partial least squares to model a response associated with the profiles. Latent variables can be difficult to interpret but our extracted features simply correspond to slopes and intercepts of particular parts of the profiles. Also, features that relate to the degree of warping between a given profile and a reference can be extracted as predictors. Selection criteria for the number of knots and common knot locations between profiles are developed. We apply our proposed method to batch fermentation data where the profiles consist of on-line measurements of process variables and the corresponding yield of the process. The extracted features have good interpretability (with large dimensional reduction) and in combination with the lasso have prediction accuracy which is comparable with that of partial least squares applied to the original profiles. Also our proposed feature extraction method is applied to publicly available data where near infrared spectra define the profiles and the prediction accuracy of our feature lasso method is comparable with those of more complicated alternatives.

AB - An automated approach to extract interpretable features of univariate or multivariate profiles (functional data) is proposed. A landmark alignment algorithm is modified and the alignment is combined with piecewise linear approximations. Least absolute shrinkage and selection operator (lasso) regression is used for selecting the most important intercepts and slopes and yields an alternative to partial least squares to model a response associated with the profiles. Latent variables can be difficult to interpret but our extracted features simply correspond to slopes and intercepts of particular parts of the profiles. Also, features that relate to the degree of warping between a given profile and a reference can be extracted as predictors. Selection criteria for the number of knots and common knot locations between profiles are developed. We apply our proposed method to batch fermentation data where the profiles consist of on-line measurements of process variables and the corresponding yield of the process. The extracted features have good interpretability (with large dimensional reduction) and in combination with the lasso have prediction accuracy which is comparable with that of partial least squares applied to the original profiles. Also our proposed feature extraction method is applied to publicly available data where near infrared spectra define the profiles and the prediction accuracy of our feature lasso method is comparable with those of more complicated alternatives.

KW - Alignment

KW - Functional partial least squares and principal components regression

KW - Knot selection

KW - Lasso

KW - Near infrared spectra

KW - Segmentation

UR - http://www.scopus.com/inward/record.url?scp=84858151833&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84858151833&partnerID=8YFLogxK

U2 - 10.1111/j.1467-9876.2011.01032.x

DO - 10.1111/j.1467-9876.2011.01032.x

M3 - Article

AN - SCOPUS:84858151833

VL - 61

SP - 327

EP - 344

JO - Journal of the Royal Statistical Society. Series C: Applied Statistics

JF - Journal of the Royal Statistical Society. Series C: Applied Statistics

SN - 0035-9254

IS - 2

ER -