Embedded supervised feature selection for multi-class data

Lin Chen; Jiliang Tang; Baoxin Li

doi:10.1137/1.9781611974973.58

Embedded supervised feature selection for multi-class data

Lin Chen, Jiliang Tang, Baoxin Li

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

9 Scopus citations

Abstract

Supervised multi-class learning arises in many application domains such as biology, computer vision, social network analysis, and information retrieval. These applications often involve high-dimensional data, which not only significantly increase the time and space requirement of the underlying algorithms but also degrade their performance due to the curse of dimensionality. Feature selection has been proven effective and efficient for preparing high-dimensional data for many learning tasks. Traditional feature selection algorithms for multi-class data assume the independence of label categories and select features with the capability to distinguish samples from different classes. However, class labels in multi-class data may be correlated and little work exists for exploiting label correlation in multi-class feature selection. In this paper, we investigate label correlation in feature selection for multi-class data. In particular, we provide a principled approach for capturing label correlation and propose an Embedded Supervised Feature Selection (ESFS) framework, which embeds label correlation modeling in supervised feature selection for multi-class data. Experiments on both synthetic data and various types of public benchmark datasets show that the proposed framework effectively captures the multi-class label correlation and significantly outperforms existing state-of-the-art baseline methods.

Original language	English (US)
Title of host publication	Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017
Editors	Nitesh Chawla, Wei Wang
Publisher	Society for Industrial and Applied Mathematics Publications
Pages	516-524
Number of pages	9
ISBN (Electronic)	9781611974874
DOIs	https://doi.org/10.1137/1.9781611974973.58
State	Published - 2017
Event	17th SIAM International Conference on Data Mining, SDM 2017 - Houston, United States Duration: Apr 27 2017 → Apr 29 2017

Publication series

Name	Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017

Other

Other	17th SIAM International Conference on Data Mining, SDM 2017
Country/Territory	United States
City	Houston
Period	4/27/17 → 4/29/17

ASJC Scopus subject areas

Software
Computer Science Applications

Access to Document

10.1137/1.9781611974973.58

Cite this

Chen, L., Tang, J., & Li, B. (2017). Embedded supervised feature selection for multi-class data. In N. Chawla, & W. Wang (Eds.), Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017 (pp. 516-524). (Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611974973.58

Embedded supervised feature selection for multi-class data. / Chen, Lin; Tang, Jiliang; Li, Baoxin.
Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017. ed. / Nitesh Chawla; Wei Wang. Society for Industrial and Applied Mathematics Publications, 2017. p. 516-524 (Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Chen, L, Tang, J & Li, B 2017, Embedded supervised feature selection for multi-class data. in N Chawla & W Wang (eds), Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017. Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017, Society for Industrial and Applied Mathematics Publications, pp. 516-524, 17th SIAM International Conference on Data Mining, SDM 2017, Houston, United States, 4/27/17. https://doi.org/10.1137/1.9781611974973.58

Chen L, Tang J, Li B. Embedded supervised feature selection for multi-class data. In Chawla N, Wang W, editors, Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017. Society for Industrial and Applied Mathematics Publications. 2017. p. 516-524. (Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017). doi: 10.1137/1.9781611974973.58

Chen, Lin ; Tang, Jiliang ; Li, Baoxin. / Embedded supervised feature selection for multi-class data. Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017. editor / Nitesh Chawla ; Wei Wang. Society for Industrial and Applied Mathematics Publications, 2017. pp. 516-524 (Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017).

@inproceedings{977a6d6de3d64dfcb3d02626e9d4095d,

title = "Embedded supervised feature selection for multi-class data",

abstract = "Supervised multi-class learning arises in many application domains such as biology, computer vision, social network analysis, and information retrieval. These applications often involve high-dimensional data, which not only significantly increase the time and space requirement of the underlying algorithms but also degrade their performance due to the curse of dimensionality. Feature selection has been proven effective and efficient for preparing high-dimensional data for many learning tasks. Traditional feature selection algorithms for multi-class data assume the independence of label categories and select features with the capability to distinguish samples from different classes. However, class labels in multi-class data may be correlated and little work exists for exploiting label correlation in multi-class feature selection. In this paper, we investigate label correlation in feature selection for multi-class data. In particular, we provide a principled approach for capturing label correlation and propose an Embedded Supervised Feature Selection (ESFS) framework, which embeds label correlation modeling in supervised feature selection for multi-class data. Experiments on both synthetic data and various types of public benchmark datasets show that the proposed framework effectively captures the multi-class label correlation and significantly outperforms existing state-of-the-art baseline methods.",

author = "Lin Chen and Jiliang Tang and Baoxin Li",

note = "Funding Information: part by ONR grant N00014-15-1-2344 and ARO grant W911NF1410371. Any opinions expressed in this material are those of the authors and do not necessarily reflect the views of ONR or ARO. Publisher Copyright: Copyright {\textcopyright} by SIAM.; 17th SIAM International Conference on Data Mining, SDM 2017 ; Conference date: 27-04-2017 Through 29-04-2017",

year = "2017",

doi = "10.1137/1.9781611974973.58",

language = "English (US)",

series = "Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017",

publisher = "Society for Industrial and Applied Mathematics Publications",

pages = "516--524",

editor = "Nitesh Chawla and Wei Wang",

booktitle = "Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017",

}

TY - GEN

T1 - Embedded supervised feature selection for multi-class data

AU - Chen, Lin

AU - Tang, Jiliang

AU - Li, Baoxin

N1 - Funding Information: part by ONR grant N00014-15-1-2344 and ARO grant W911NF1410371. Any opinions expressed in this material are those of the authors and do not necessarily reflect the views of ONR or ARO. Publisher Copyright: Copyright © by SIAM.

PY - 2017

Y1 - 2017

N2 - Supervised multi-class learning arises in many application domains such as biology, computer vision, social network analysis, and information retrieval. These applications often involve high-dimensional data, which not only significantly increase the time and space requirement of the underlying algorithms but also degrade their performance due to the curse of dimensionality. Feature selection has been proven effective and efficient for preparing high-dimensional data for many learning tasks. Traditional feature selection algorithms for multi-class data assume the independence of label categories and select features with the capability to distinguish samples from different classes. However, class labels in multi-class data may be correlated and little work exists for exploiting label correlation in multi-class feature selection. In this paper, we investigate label correlation in feature selection for multi-class data. In particular, we provide a principled approach for capturing label correlation and propose an Embedded Supervised Feature Selection (ESFS) framework, which embeds label correlation modeling in supervised feature selection for multi-class data. Experiments on both synthetic data and various types of public benchmark datasets show that the proposed framework effectively captures the multi-class label correlation and significantly outperforms existing state-of-the-art baseline methods.

AB - Supervised multi-class learning arises in many application domains such as biology, computer vision, social network analysis, and information retrieval. These applications often involve high-dimensional data, which not only significantly increase the time and space requirement of the underlying algorithms but also degrade their performance due to the curse of dimensionality. Feature selection has been proven effective and efficient for preparing high-dimensional data for many learning tasks. Traditional feature selection algorithms for multi-class data assume the independence of label categories and select features with the capability to distinguish samples from different classes. However, class labels in multi-class data may be correlated and little work exists for exploiting label correlation in multi-class feature selection. In this paper, we investigate label correlation in feature selection for multi-class data. In particular, we provide a principled approach for capturing label correlation and propose an Embedded Supervised Feature Selection (ESFS) framework, which embeds label correlation modeling in supervised feature selection for multi-class data. Experiments on both synthetic data and various types of public benchmark datasets show that the proposed framework effectively captures the multi-class label correlation and significantly outperforms existing state-of-the-art baseline methods.

UR - http://www.scopus.com/inward/record.url?scp=85027854621&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85027854621&partnerID=8YFLogxK

U2 - 10.1137/1.9781611974973.58

DO - 10.1137/1.9781611974973.58

M3 - Conference contribution

AN - SCOPUS:85027854621

T3 - Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017

SP - 516

EP - 524

BT - Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017

A2 - Chawla, Nitesh

A2 - Wang, Wei

PB - Society for Industrial and Applied Mathematics Publications

T2 - 17th SIAM International Conference on Data Mining, SDM 2017

Y2 - 27 April 2017 through 29 April 2017

ER -

Embedded supervised feature selection for multi-class data

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this