Feature selection for noisy variation patterns using kernel principal component analysis

Anshuman Sahu; Daniel W. Apley; George Runger

doi:10.1016/j.knosys.2014.08.027

Feature selection for noisy variation patterns using kernel principal component analysis

Anshuman Sahu, Daniel W. Apley, George Runger

Research output: Contribution to journal › Article › peer-review

17 Scopus citations

Abstract

Kernel Principal Component Analysis (KPCA) is a technique widely used to understand and visualize non-linear variation patterns by inverse mapping the projected data from a high-dimensional feature space back to the original input space. Variation patterns often occur in a small number of relevant features out of the overall set of features that are recorded in the data. It is, therefore, crucial to discern this set of relevant features that define the pattern. Here we propose a feature selection procedure that augments KPCA to obtain importance estimates of the features given the noisy training data. Our feature selection strategy involves projecting the data points onto sparse random vectors for calculating the kernel matrix. We then match pairs of such projections, and determine the preimages of the data with and without a feature, thereby trying to identify the importance of that feature. Thus, preimages' differences within pairs are used to identify the relevant features. An advantage of our method is it can be used with any suitable KPCA algorithm. Moreover, the computations can be parallelized easily leading to significant speedup. We demonstrate our method on several simulated and real data sets, and compare the results to alternative approaches in the literature.

Original language	English (US)
Pages (from-to)	37-47
Number of pages	11
Journal	Knowledge-Based Systems
Volume	72
DOIs	https://doi.org/10.1016/j.knosys.2014.08.027
State	Published - Dec 1 2014

Keywords

Feature ensembles
Kernel feature space
Nonlinear PCA
Preimages
Variation patterns

ASJC Scopus subject areas

Software
Information Systems and Management
Artificial Intelligence
Management Information Systems

Access to Document

10.1016/j.knosys.2014.08.027

Cite this

@article{7b9ba246149c4f699f56507a6ffdd6f9,

title = "Feature selection for noisy variation patterns using kernel principal component analysis",

abstract = "Kernel Principal Component Analysis (KPCA) is a technique widely used to understand and visualize non-linear variation patterns by inverse mapping the projected data from a high-dimensional feature space back to the original input space. Variation patterns often occur in a small number of relevant features out of the overall set of features that are recorded in the data. It is, therefore, crucial to discern this set of relevant features that define the pattern. Here we propose a feature selection procedure that augments KPCA to obtain importance estimates of the features given the noisy training data. Our feature selection strategy involves projecting the data points onto sparse random vectors for calculating the kernel matrix. We then match pairs of such projections, and determine the preimages of the data with and without a feature, thereby trying to identify the importance of that feature. Thus, preimages' differences within pairs are used to identify the relevant features. An advantage of our method is it can be used with any suitable KPCA algorithm. Moreover, the computations can be parallelized easily leading to significant speedup. We demonstrate our method on several simulated and real data sets, and compare the results to alternative approaches in the literature.",

keywords = "Feature ensembles, Kernel feature space, Nonlinear PCA, Preimages, Variation patterns",

author = "Anshuman Sahu and Apley, {Daniel W.} and George Runger",

year = "2014",

month = dec,

day = "1",

doi = "10.1016/j.knosys.2014.08.027",

language = "English (US)",

volume = "72",

pages = "37--47",

journal = "Knowledge-Based Systems",

issn = "0950-7051",

publisher = "Elsevier",

}

TY - JOUR

T1 - Feature selection for noisy variation patterns using kernel principal component analysis

AU - Sahu, Anshuman

AU - Apley, Daniel W.

AU - Runger, George

PY - 2014/12/1

Y1 - 2014/12/1

N2 - Kernel Principal Component Analysis (KPCA) is a technique widely used to understand and visualize non-linear variation patterns by inverse mapping the projected data from a high-dimensional feature space back to the original input space. Variation patterns often occur in a small number of relevant features out of the overall set of features that are recorded in the data. It is, therefore, crucial to discern this set of relevant features that define the pattern. Here we propose a feature selection procedure that augments KPCA to obtain importance estimates of the features given the noisy training data. Our feature selection strategy involves projecting the data points onto sparse random vectors for calculating the kernel matrix. We then match pairs of such projections, and determine the preimages of the data with and without a feature, thereby trying to identify the importance of that feature. Thus, preimages' differences within pairs are used to identify the relevant features. An advantage of our method is it can be used with any suitable KPCA algorithm. Moreover, the computations can be parallelized easily leading to significant speedup. We demonstrate our method on several simulated and real data sets, and compare the results to alternative approaches in the literature.

AB - Kernel Principal Component Analysis (KPCA) is a technique widely used to understand and visualize non-linear variation patterns by inverse mapping the projected data from a high-dimensional feature space back to the original input space. Variation patterns often occur in a small number of relevant features out of the overall set of features that are recorded in the data. It is, therefore, crucial to discern this set of relevant features that define the pattern. Here we propose a feature selection procedure that augments KPCA to obtain importance estimates of the features given the noisy training data. Our feature selection strategy involves projecting the data points onto sparse random vectors for calculating the kernel matrix. We then match pairs of such projections, and determine the preimages of the data with and without a feature, thereby trying to identify the importance of that feature. Thus, preimages' differences within pairs are used to identify the relevant features. An advantage of our method is it can be used with any suitable KPCA algorithm. Moreover, the computations can be parallelized easily leading to significant speedup. We demonstrate our method on several simulated and real data sets, and compare the results to alternative approaches in the literature.

KW - Feature ensembles

KW - Kernel feature space

KW - Nonlinear PCA

KW - Preimages

KW - Variation patterns

UR - http://www.scopus.com/inward/record.url?scp=84909959893&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84909959893&partnerID=8YFLogxK

U2 - 10.1016/j.knosys.2014.08.027

DO - 10.1016/j.knosys.2014.08.027

M3 - Article

AN - SCOPUS:84909959893

SN - 0950-7051

VL - 72

SP - 37

EP - 47

JO - Knowledge-Based Systems

JF - Knowledge-Based Systems

ER -

Feature selection for noisy variation patterns using kernel principal component analysis

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this