Abstract
Feature selection, as a preprocessing step to machine learning, is effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving result comprehensibility. However, the recent increase of dimensionality of data poses a severe challenge to many existing feature selection methods with respect to efficiency and effectiveness. In this work, we introduce a novel concept, predominant correlation, and propose a fast filter method which can identify relevant features as well as redundancy among relevant features without pairwise correlation analysis. The efficiency and effectiveness of our method is demonstrated through extensive comparisons with other methods using real-world data of high dimensionality.
| Original language | English (US) |
|---|---|
| Title of host publication | Proceedings, Twentieth International Conference on Machine Learning |
| Editors | T. Fawcett, N. Mishra |
| Pages | 856-863 |
| Number of pages | 8 |
| Volume | 2 |
| State | Published - 2003 |
| Event | Proceedings, Twentieth International Conference on Machine Learning - Washington, DC, United States Duration: Aug 21 2003 → Aug 24 2003 |
Other
| Other | Proceedings, Twentieth International Conference on Machine Learning |
|---|---|
| Country | United States |
| City | Washington, DC |
| Period | 8/21/03 → 8/24/03 |
Fingerprint
ASJC Scopus subject areas
- Engineering(all)
Cite this
Feature Selection for High-Dimensional Data : A Fast Correlation-Based Filter Solution. / Yu, Lei; Liu, Huan.
Proceedings, Twentieth International Conference on Machine Learning. ed. / T. Fawcett; N. Mishra. Vol. 2 2003. p. 856-863.Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
}
TY - GEN
T1 - Feature Selection for High-Dimensional Data
T2 - A Fast Correlation-Based Filter Solution
AU - Yu, Lei
AU - Liu, Huan
PY - 2003
Y1 - 2003
N2 - Feature selection, as a preprocessing step to machine learning, is effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving result comprehensibility. However, the recent increase of dimensionality of data poses a severe challenge to many existing feature selection methods with respect to efficiency and effectiveness. In this work, we introduce a novel concept, predominant correlation, and propose a fast filter method which can identify relevant features as well as redundancy among relevant features without pairwise correlation analysis. The efficiency and effectiveness of our method is demonstrated through extensive comparisons with other methods using real-world data of high dimensionality.
AB - Feature selection, as a preprocessing step to machine learning, is effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving result comprehensibility. However, the recent increase of dimensionality of data poses a severe challenge to many existing feature selection methods with respect to efficiency and effectiveness. In this work, we introduce a novel concept, predominant correlation, and propose a fast filter method which can identify relevant features as well as redundancy among relevant features without pairwise correlation analysis. The efficiency and effectiveness of our method is demonstrated through extensive comparisons with other methods using real-world data of high dimensionality.
UR - http://www.scopus.com/inward/record.url?scp=1942451938&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=1942451938&partnerID=8YFLogxK
M3 - Conference contribution
SN - 1577351894
VL - 2
SP - 856
EP - 863
BT - Proceedings, Twentieth International Conference on Machine Learning
A2 - Fawcett, T.
A2 - Mishra, N.
ER -