Gaussian mixture model with feature selection: An embedded approach

Yinlin Fu, Xiaonan Liu, Suryadipto Sarkar, Teresa Wu

Research output: Contribution to journalArticlepeer-review

31 Scopus citations

Abstract

Gaussian Mixture Model (GMM) is a popular clustering algorithm due to its neat statistical properties, which enable the “soft” clustering and the determination of the number of clusters. Expectation-Maximization (EM) is usually applied to estimate the GMM parameters. While promising, the inclusion of features that are not contributing to clustering may confuse the model and increase computational cost. Recognizing the issue, in this paper, we propose a new algorithm, termed Expectation Selection Maximization (ESM), by adding a feature selection step (S). Specifically, we introduce a relevancy index (RI), a metric indicating the probability of assigning a data point to a specific clustering group. The RI index reveals the contribution of the feature to the clustering process thus can assist the feature selection. We conduct theoretical analysis to justify the use of RI for feature selection. Also, to demonstrate the efficacy of the proposed ESM, two synthetic datasets, four benchmark datasets, and an Alzheimer's Disease dataset are studied.

Original languageEnglish (US)
Article number107000
JournalComputers and Industrial Engineering
Volume152
DOIs
StatePublished - Feb 2021

Keywords

  • Expectation Maximization (EM)
  • Feature selection
  • Gaussian Mixture Model (GMM)

ASJC Scopus subject areas

  • General Computer Science
  • General Engineering

Fingerprint

Dive into the research topics of 'Gaussian mixture model with feature selection: An embedded approach'. Together they form a unique fingerprint.

Cite this