Abstract

Supervised multi-class learning arises in many application domains such as biology, computer vision, social network analysis, and information retrieval. These applications often involve high-dimensional data, which not only significantly increase the time and space requirement of the underlying algorithms but also degrade their performance due to the curse of dimensionality. Feature selection has been proven effective and efficient for preparing high-dimensional data for many learning tasks. Traditional feature selection algorithms for multi-class data assume the independence of label categories and select features with the capability to distinguish samples from different classes. However, class labels in multi-class data may be correlated and little work exists for exploiting label correlation in multi-class feature selection. In this paper, we investigate label correlation in feature selection for multi-class data. In particular, we provide a principled approach for capturing label correlation and propose an Embedded Supervised Feature Selection (ESFS) framework, which embeds label correlation modeling in supervised feature selection for multi-class data. Experiments on both synthetic data and various types of public benchmark datasets show that the proposed framework effectively captures the multi-class label correlation and significantly outperforms existing state-of-the-art baseline methods.

Original languageEnglish (US)
Title of host publicationProceedings of the 17th SIAM International Conference on Data Mining, SDM 2017
PublisherSociety for Industrial and Applied Mathematics Publications
Pages516-524
Number of pages9
ISBN (Electronic)9781611974874
StatePublished - 2017
Event17th SIAM International Conference on Data Mining, SDM 2017 - Houston, United States
Duration: Apr 27 2017Apr 29 2017

Other

Other17th SIAM International Conference on Data Mining, SDM 2017
CountryUnited States
CityHouston
Period4/27/174/29/17

Fingerprint

Feature extraction
Labels
Electric network analysis
Information retrieval
Computer vision
Experiments

ASJC Scopus subject areas

  • Software
  • Computer Science Applications

Cite this

Chen, L., Tang, J., & Li, B. (2017). Embedded supervised feature selection for multi-class data. In Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017 (pp. 516-524). Society for Industrial and Applied Mathematics Publications.

Embedded supervised feature selection for multi-class data. / Chen, Lin; Tang, Jiliang; Li, Baoxin.

Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017. Society for Industrial and Applied Mathematics Publications, 2017. p. 516-524.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chen, L, Tang, J & Li, B 2017, Embedded supervised feature selection for multi-class data. in Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017. Society for Industrial and Applied Mathematics Publications, pp. 516-524, 17th SIAM International Conference on Data Mining, SDM 2017, Houston, United States, 4/27/17.
Chen L, Tang J, Li B. Embedded supervised feature selection for multi-class data. In Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017. Society for Industrial and Applied Mathematics Publications. 2017. p. 516-524
Chen, Lin ; Tang, Jiliang ; Li, Baoxin. / Embedded supervised feature selection for multi-class data. Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017. Society for Industrial and Applied Mathematics Publications, 2017. pp. 516-524
@inproceedings{977a6d6de3d64dfcb3d02626e9d4095d,
title = "Embedded supervised feature selection for multi-class data",
abstract = "Supervised multi-class learning arises in many application domains such as biology, computer vision, social network analysis, and information retrieval. These applications often involve high-dimensional data, which not only significantly increase the time and space requirement of the underlying algorithms but also degrade their performance due to the curse of dimensionality. Feature selection has been proven effective and efficient for preparing high-dimensional data for many learning tasks. Traditional feature selection algorithms for multi-class data assume the independence of label categories and select features with the capability to distinguish samples from different classes. However, class labels in multi-class data may be correlated and little work exists for exploiting label correlation in multi-class feature selection. In this paper, we investigate label correlation in feature selection for multi-class data. In particular, we provide a principled approach for capturing label correlation and propose an Embedded Supervised Feature Selection (ESFS) framework, which embeds label correlation modeling in supervised feature selection for multi-class data. Experiments on both synthetic data and various types of public benchmark datasets show that the proposed framework effectively captures the multi-class label correlation and significantly outperforms existing state-of-the-art baseline methods.",
author = "Lin Chen and Jiliang Tang and Baoxin Li",
year = "2017",
language = "English (US)",
pages = "516--524",
booktitle = "Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017",
publisher = "Society for Industrial and Applied Mathematics Publications",
address = "United States",

}

TY - GEN

T1 - Embedded supervised feature selection for multi-class data

AU - Chen, Lin

AU - Tang, Jiliang

AU - Li, Baoxin

PY - 2017

Y1 - 2017

N2 - Supervised multi-class learning arises in many application domains such as biology, computer vision, social network analysis, and information retrieval. These applications often involve high-dimensional data, which not only significantly increase the time and space requirement of the underlying algorithms but also degrade their performance due to the curse of dimensionality. Feature selection has been proven effective and efficient for preparing high-dimensional data for many learning tasks. Traditional feature selection algorithms for multi-class data assume the independence of label categories and select features with the capability to distinguish samples from different classes. However, class labels in multi-class data may be correlated and little work exists for exploiting label correlation in multi-class feature selection. In this paper, we investigate label correlation in feature selection for multi-class data. In particular, we provide a principled approach for capturing label correlation and propose an Embedded Supervised Feature Selection (ESFS) framework, which embeds label correlation modeling in supervised feature selection for multi-class data. Experiments on both synthetic data and various types of public benchmark datasets show that the proposed framework effectively captures the multi-class label correlation and significantly outperforms existing state-of-the-art baseline methods.

AB - Supervised multi-class learning arises in many application domains such as biology, computer vision, social network analysis, and information retrieval. These applications often involve high-dimensional data, which not only significantly increase the time and space requirement of the underlying algorithms but also degrade their performance due to the curse of dimensionality. Feature selection has been proven effective and efficient for preparing high-dimensional data for many learning tasks. Traditional feature selection algorithms for multi-class data assume the independence of label categories and select features with the capability to distinguish samples from different classes. However, class labels in multi-class data may be correlated and little work exists for exploiting label correlation in multi-class feature selection. In this paper, we investigate label correlation in feature selection for multi-class data. In particular, we provide a principled approach for capturing label correlation and propose an Embedded Supervised Feature Selection (ESFS) framework, which embeds label correlation modeling in supervised feature selection for multi-class data. Experiments on both synthetic data and various types of public benchmark datasets show that the proposed framework effectively captures the multi-class label correlation and significantly outperforms existing state-of-the-art baseline methods.

UR - http://www.scopus.com/inward/record.url?scp=85027854621&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85027854621&partnerID=8YFLogxK

M3 - Conference contribution

SP - 516

EP - 524

BT - Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017

PB - Society for Industrial and Applied Mathematics Publications

ER -