Efficient feature selection via analysis of relevance and redundancy

Lei Yu, Huan Liu

Research output: Contribution to journalArticle

1256 Citations (Scopus)

Abstract

Feature selection is applied to reduce the number of features in many applications where data has hundreds or thousands of features. Existing feature selection methods mainly focus on finding relevant features. In this paper, we show that feature relevance alone is insufficient for efficient feature selection of high-dimensional data. We define feature redundancy and propose to perform explicit redundancy analysis in feature selection. A new framework is introduced that decouples relevance analysis and redundancy analysis. We develop a correlation-based method for relevance and redundancy analysis, and conduct an empirical study of its efficiency and effectiveness comparing with representative methods.

Original languageEnglish (US)
Pages (from-to)1205-1224
Number of pages20
JournalJournal of Machine Learning Research
Volume5
StatePublished - Oct 1 2004

Fingerprint

Feature Selection
Redundancy
Feature extraction
High-dimensional Data
Empirical Study
Relevance

Keywords

  • Feature selection
  • High dimensionality
  • Redundancy
  • Relevance
  • Supervised learning

ASJC Scopus subject areas

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Cite this

Efficient feature selection via analysis of relevance and redundancy. / Yu, Lei; Liu, Huan.

In: Journal of Machine Learning Research, Vol. 5, 01.10.2004, p. 1205-1224.

Research output: Contribution to journalArticle

@article{e9612315789c4285a26e917c96b94e6a,
title = "Efficient feature selection via analysis of relevance and redundancy",
abstract = "Feature selection is applied to reduce the number of features in many applications where data has hundreds or thousands of features. Existing feature selection methods mainly focus on finding relevant features. In this paper, we show that feature relevance alone is insufficient for efficient feature selection of high-dimensional data. We define feature redundancy and propose to perform explicit redundancy analysis in feature selection. A new framework is introduced that decouples relevance analysis and redundancy analysis. We develop a correlation-based method for relevance and redundancy analysis, and conduct an empirical study of its efficiency and effectiveness comparing with representative methods.",
keywords = "Feature selection, High dimensionality, Redundancy, Relevance, Supervised learning",
author = "Lei Yu and Huan Liu",
year = "2004",
month = "10",
day = "1",
language = "English (US)",
volume = "5",
pages = "1205--1224",
journal = "Journal of Machine Learning Research",
issn = "1532-4435",
publisher = "Microtome Publishing",

}

TY - JOUR

T1 - Efficient feature selection via analysis of relevance and redundancy

AU - Yu, Lei

AU - Liu, Huan

PY - 2004/10/1

Y1 - 2004/10/1

N2 - Feature selection is applied to reduce the number of features in many applications where data has hundreds or thousands of features. Existing feature selection methods mainly focus on finding relevant features. In this paper, we show that feature relevance alone is insufficient for efficient feature selection of high-dimensional data. We define feature redundancy and propose to perform explicit redundancy analysis in feature selection. A new framework is introduced that decouples relevance analysis and redundancy analysis. We develop a correlation-based method for relevance and redundancy analysis, and conduct an empirical study of its efficiency and effectiveness comparing with representative methods.

AB - Feature selection is applied to reduce the number of features in many applications where data has hundreds or thousands of features. Existing feature selection methods mainly focus on finding relevant features. In this paper, we show that feature relevance alone is insufficient for efficient feature selection of high-dimensional data. We define feature redundancy and propose to perform explicit redundancy analysis in feature selection. A new framework is introduced that decouples relevance analysis and redundancy analysis. We develop a correlation-based method for relevance and redundancy analysis, and conduct an empirical study of its efficiency and effectiveness comparing with representative methods.

KW - Feature selection

KW - High dimensionality

KW - Redundancy

KW - Relevance

KW - Supervised learning

UR - http://www.scopus.com/inward/record.url?scp=25144492516&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=25144492516&partnerID=8YFLogxK

M3 - Article

VL - 5

SP - 1205

EP - 1224

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

SN - 1532-4435

ER -