Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution

Lei Yu, Huan Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1234 Citations (Scopus)

Abstract

Feature selection, as a preprocessing step to machine learning, is effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving result comprehensibility. However, the recent increase of dimensionality of data poses a severe challenge to many existing feature selection methods with respect to efficiency and effectiveness. In this work, we introduce a novel concept, predominant correlation, and propose a fast filter method which can identify relevant features as well as redundancy among relevant features without pairwise correlation analysis. The efficiency and effectiveness of our method is demonstrated through extensive comparisons with other methods using real-world data of high dimensionality.

Original languageEnglish (US)
Title of host publicationProceedings, Twentieth International Conference on Machine Learning
EditorsT. Fawcett, N. Mishra
Pages856-863
Number of pages8
Volume2
StatePublished - 2003
EventProceedings, Twentieth International Conference on Machine Learning - Washington, DC, United States
Duration: Aug 21 2003Aug 24 2003

Other

OtherProceedings, Twentieth International Conference on Machine Learning
CountryUnited States
CityWashington, DC
Period8/21/038/24/03

Fingerprint

Feature extraction
Redundancy
Learning systems

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Yu, L., & Liu, H. (2003). Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution. In T. Fawcett, & N. Mishra (Eds.), Proceedings, Twentieth International Conference on Machine Learning (Vol. 2, pp. 856-863)

Feature Selection for High-Dimensional Data : A Fast Correlation-Based Filter Solution. / Yu, Lei; Liu, Huan.

Proceedings, Twentieth International Conference on Machine Learning. ed. / T. Fawcett; N. Mishra. Vol. 2 2003. p. 856-863.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yu, L & Liu, H 2003, Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution. in T Fawcett & N Mishra (eds), Proceedings, Twentieth International Conference on Machine Learning. vol. 2, pp. 856-863, Proceedings, Twentieth International Conference on Machine Learning, Washington, DC, United States, 8/21/03.
Yu L, Liu H. Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution. In Fawcett T, Mishra N, editors, Proceedings, Twentieth International Conference on Machine Learning. Vol. 2. 2003. p. 856-863
Yu, Lei ; Liu, Huan. / Feature Selection for High-Dimensional Data : A Fast Correlation-Based Filter Solution. Proceedings, Twentieth International Conference on Machine Learning. editor / T. Fawcett ; N. Mishra. Vol. 2 2003. pp. 856-863
@inproceedings{14e3662b1cd64554adaeb966bb2a44c4,
title = "Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution",
abstract = "Feature selection, as a preprocessing step to machine learning, is effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving result comprehensibility. However, the recent increase of dimensionality of data poses a severe challenge to many existing feature selection methods with respect to efficiency and effectiveness. In this work, we introduce a novel concept, predominant correlation, and propose a fast filter method which can identify relevant features as well as redundancy among relevant features without pairwise correlation analysis. The efficiency and effectiveness of our method is demonstrated through extensive comparisons with other methods using real-world data of high dimensionality.",
author = "Lei Yu and Huan Liu",
year = "2003",
language = "English (US)",
isbn = "1577351894",
volume = "2",
pages = "856--863",
editor = "T. Fawcett and N. Mishra",
booktitle = "Proceedings, Twentieth International Conference on Machine Learning",

}

TY - GEN

T1 - Feature Selection for High-Dimensional Data

T2 - A Fast Correlation-Based Filter Solution

AU - Yu, Lei

AU - Liu, Huan

PY - 2003

Y1 - 2003

N2 - Feature selection, as a preprocessing step to machine learning, is effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving result comprehensibility. However, the recent increase of dimensionality of data poses a severe challenge to many existing feature selection methods with respect to efficiency and effectiveness. In this work, we introduce a novel concept, predominant correlation, and propose a fast filter method which can identify relevant features as well as redundancy among relevant features without pairwise correlation analysis. The efficiency and effectiveness of our method is demonstrated through extensive comparisons with other methods using real-world data of high dimensionality.

AB - Feature selection, as a preprocessing step to machine learning, is effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving result comprehensibility. However, the recent increase of dimensionality of data poses a severe challenge to many existing feature selection methods with respect to efficiency and effectiveness. In this work, we introduce a novel concept, predominant correlation, and propose a fast filter method which can identify relevant features as well as redundancy among relevant features without pairwise correlation analysis. The efficiency and effectiveness of our method is demonstrated through extensive comparisons with other methods using real-world data of high dimensionality.

UR - http://www.scopus.com/inward/record.url?scp=1942451938&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=1942451938&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:1942451938

SN - 1577351894

VL - 2

SP - 856

EP - 863

BT - Proceedings, Twentieth International Conference on Machine Learning

A2 - Fawcett, T.

A2 - Mishra, N.

ER -