Efficient feature selection via analysis of relevance and redundancy

Lei Yu; Huan Liu

Efficient feature selection via analysis of relevance and redundancy

Research output: Contribution to journal › Article › peer-review

Abstract

Feature selection is applied to reduce the number of features in many applications where data has hundreds or thousands of features. Existing feature selection methods mainly focus on finding relevant features. In this paper, we show that feature relevance alone is insufficient for efficient feature selection of high-dimensional data. We define feature redundancy and propose to perform explicit redundancy analysis in feature selection. A new framework is introduced that decouples relevance analysis and redundancy analysis. We develop a correlation-based method for relevance and redundancy analysis, and conduct an empirical study of its efficiency and effectiveness comparing with representative methods.

Original language	English (US)
Pages (from-to)	1205-1224
Number of pages	20
Journal	Journal of Machine Learning Research
Volume	5
State	Published - Oct 1 2004

Keywords

Feature selection
High dimensionality
Redundancy
Relevance
Supervised learning

ASJC Scopus subject areas

Software
Control and Systems Engineering
Statistics and Probability
Artificial Intelligence

Cite this

@article{e9612315789c4285a26e917c96b94e6a,

title = "Efficient feature selection via analysis of relevance and redundancy",

abstract = "Feature selection is applied to reduce the number of features in many applications where data has hundreds or thousands of features. Existing feature selection methods mainly focus on finding relevant features. In this paper, we show that feature relevance alone is insufficient for efficient feature selection of high-dimensional data. We define feature redundancy and propose to perform explicit redundancy analysis in feature selection. A new framework is introduced that decouples relevance analysis and redundancy analysis. We develop a correlation-based method for relevance and redundancy analysis, and conduct an empirical study of its efficiency and effectiveness comparing with representative methods.",

keywords = "Feature selection, High dimensionality, Redundancy, Relevance, Supervised learning",

author = "Lei Yu and Huan Liu",

note = "Publisher Copyright: {\textcopyright} 2004 Lei Yu and Huan Liu.",

year = "2004",

month = oct,

day = "1",

language = "English (US)",

volume = "5",

pages = "1205--1224",

journal = "Journal of Machine Learning Research",

issn = "1532-4435",

publisher = "Microtome Publishing",

}

TY - JOUR

T1 - Efficient feature selection via analysis of relevance and redundancy

AU - Yu, Lei

AU - Liu, Huan

PY - 2004/10/1

Y1 - 2004/10/1

N2 - Feature selection is applied to reduce the number of features in many applications where data has hundreds or thousands of features. Existing feature selection methods mainly focus on finding relevant features. In this paper, we show that feature relevance alone is insufficient for efficient feature selection of high-dimensional data. We define feature redundancy and propose to perform explicit redundancy analysis in feature selection. A new framework is introduced that decouples relevance analysis and redundancy analysis. We develop a correlation-based method for relevance and redundancy analysis, and conduct an empirical study of its efficiency and effectiveness comparing with representative methods.

AB - Feature selection is applied to reduce the number of features in many applications where data has hundreds or thousands of features. Existing feature selection methods mainly focus on finding relevant features. In this paper, we show that feature relevance alone is insufficient for efficient feature selection of high-dimensional data. We define feature redundancy and propose to perform explicit redundancy analysis in feature selection. A new framework is introduced that decouples relevance analysis and redundancy analysis. We develop a correlation-based method for relevance and redundancy analysis, and conduct an empirical study of its efficiency and effectiveness comparing with representative methods.

KW - Feature selection

KW - High dimensionality

KW - Redundancy

KW - Relevance

KW - Supervised learning

UR - http://www.scopus.com/inward/record.url?scp=25144492516&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=25144492516&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:25144492516

SN - 1532-4435

VL - 5

SP - 1205

EP - 1224

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

ER -

Efficient feature selection via analysis of relevance and redundancy

Abstract

Keywords

ASJC Scopus subject areas

Other files and links

Fingerprint

Cite this