TY - GEN
T1 - Embedded supervised feature selection for multi-class data
AU - Chen, Lin
AU - Tang, Jiliang
AU - Li, Baoxin
N1 - Funding Information:
part by ONR grant N00014-15-1-2344 and ARO grant W911NF1410371. Any opinions expressed in this material are those of the authors and do not necessarily reflect the views of ONR or ARO.
Publisher Copyright:
Copyright © by SIAM.
PY - 2017
Y1 - 2017
N2 - Supervised multi-class learning arises in many application domains such as biology, computer vision, social network analysis, and information retrieval. These applications often involve high-dimensional data, which not only significantly increase the time and space requirement of the underlying algorithms but also degrade their performance due to the curse of dimensionality. Feature selection has been proven effective and efficient for preparing high-dimensional data for many learning tasks. Traditional feature selection algorithms for multi-class data assume the independence of label categories and select features with the capability to distinguish samples from different classes. However, class labels in multi-class data may be correlated and little work exists for exploiting label correlation in multi-class feature selection. In this paper, we investigate label correlation in feature selection for multi-class data. In particular, we provide a principled approach for capturing label correlation and propose an Embedded Supervised Feature Selection (ESFS) framework, which embeds label correlation modeling in supervised feature selection for multi-class data. Experiments on both synthetic data and various types of public benchmark datasets show that the proposed framework effectively captures the multi-class label correlation and significantly outperforms existing state-of-the-art baseline methods.
AB - Supervised multi-class learning arises in many application domains such as biology, computer vision, social network analysis, and information retrieval. These applications often involve high-dimensional data, which not only significantly increase the time and space requirement of the underlying algorithms but also degrade their performance due to the curse of dimensionality. Feature selection has been proven effective and efficient for preparing high-dimensional data for many learning tasks. Traditional feature selection algorithms for multi-class data assume the independence of label categories and select features with the capability to distinguish samples from different classes. However, class labels in multi-class data may be correlated and little work exists for exploiting label correlation in multi-class feature selection. In this paper, we investigate label correlation in feature selection for multi-class data. In particular, we provide a principled approach for capturing label correlation and propose an Embedded Supervised Feature Selection (ESFS) framework, which embeds label correlation modeling in supervised feature selection for multi-class data. Experiments on both synthetic data and various types of public benchmark datasets show that the proposed framework effectively captures the multi-class label correlation and significantly outperforms existing state-of-the-art baseline methods.
UR - http://www.scopus.com/inward/record.url?scp=85027854621&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85027854621&partnerID=8YFLogxK
U2 - 10.1137/1.9781611974973.58
DO - 10.1137/1.9781611974973.58
M3 - Conference contribution
AN - SCOPUS:85027854621
T3 - Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017
SP - 516
EP - 524
BT - Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017
A2 - Chawla, Nitesh
A2 - Wang, Wei
PB - Society for Industrial and Applied Mathematics Publications
T2 - 17th SIAM International Conference on Data Mining, SDM 2017
Y2 - 27 April 2017 through 29 April 2017
ER -