Efficient feature selection via analysis of relevance and redundancy

Lei Yu, Huan Liu

Research output: Contribution to journalArticle

1338 Scopus citations

Abstract

Feature selection is applied to reduce the number of features in many applications where data has hundreds or thousands of features. Existing feature selection methods mainly focus on finding relevant features. In this paper, we show that feature relevance alone is insufficient for efficient feature selection of high-dimensional data. We define feature redundancy and propose to perform explicit redundancy analysis in feature selection. A new framework is introduced that decouples relevance analysis and redundancy analysis. We develop a correlation-based method for relevance and redundancy analysis, and conduct an empirical study of its efficiency and effectiveness comparing with representative methods.

Original languageEnglish (US)
Pages (from-to)1205-1224
Number of pages20
JournalJournal of Machine Learning Research
Volume5
StatePublished - Oct 1 2004

    Fingerprint

Keywords

  • Feature selection
  • High dimensionality
  • Redundancy
  • Relevance
  • Supervised learning

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence

Cite this