TY - JOUR
T1 - Gaussian mixture model with feature selection
T2 - An embedded approach
AU - Fu, Yinlin
AU - Liu, Xiaonan
AU - Sarkar, Suryadipto
AU - Wu, Teresa
N1 - Publisher Copyright:
© 2020 Elsevier Ltd
PY - 2021/2
Y1 - 2021/2
N2 - Gaussian Mixture Model (GMM) is a popular clustering algorithm due to its neat statistical properties, which enable the “soft” clustering and the determination of the number of clusters. Expectation-Maximization (EM) is usually applied to estimate the GMM parameters. While promising, the inclusion of features that are not contributing to clustering may confuse the model and increase computational cost. Recognizing the issue, in this paper, we propose a new algorithm, termed Expectation Selection Maximization (ESM), by adding a feature selection step (S). Specifically, we introduce a relevancy index (RI), a metric indicating the probability of assigning a data point to a specific clustering group. The RI index reveals the contribution of the feature to the clustering process thus can assist the feature selection. We conduct theoretical analysis to justify the use of RI for feature selection. Also, to demonstrate the efficacy of the proposed ESM, two synthetic datasets, four benchmark datasets, and an Alzheimer's Disease dataset are studied.
AB - Gaussian Mixture Model (GMM) is a popular clustering algorithm due to its neat statistical properties, which enable the “soft” clustering and the determination of the number of clusters. Expectation-Maximization (EM) is usually applied to estimate the GMM parameters. While promising, the inclusion of features that are not contributing to clustering may confuse the model and increase computational cost. Recognizing the issue, in this paper, we propose a new algorithm, termed Expectation Selection Maximization (ESM), by adding a feature selection step (S). Specifically, we introduce a relevancy index (RI), a metric indicating the probability of assigning a data point to a specific clustering group. The RI index reveals the contribution of the feature to the clustering process thus can assist the feature selection. We conduct theoretical analysis to justify the use of RI for feature selection. Also, to demonstrate the efficacy of the proposed ESM, two synthetic datasets, four benchmark datasets, and an Alzheimer's Disease dataset are studied.
KW - Expectation Maximization (EM)
KW - Feature selection
KW - Gaussian Mixture Model (GMM)
UR - http://www.scopus.com/inward/record.url?scp=85099266810&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85099266810&partnerID=8YFLogxK
U2 - 10.1016/j.cie.2020.107000
DO - 10.1016/j.cie.2020.107000
M3 - Article
AN - SCOPUS:85099266810
SN - 0360-8352
VL - 152
JO - Computers and Industrial Engineering
JF - Computers and Industrial Engineering
M1 - 107000
ER -