Efficiently handling feature redundancy in high-dimensional data

Lei Yu, Huan Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

44 Citations (Scopus)

Abstract

High-dimensional data poses a severe challenge for data mining. Feature selection is a frequently used technique in pre-processing high-dimensional data for successful data mining. Traditionally, feature selection is focused on removing irrelevant features. However, for high-dimensional data, removing redundant features is equally critical. In this paper, we provide a study of feature redundancy in high-dimensional data and propose a novel correlation-based approach to feature selection within the filter model. The extensive empirical study using real-world data shows that the proposed approach is efficient and effective in removing redundant and irrelevant features.

Original languageEnglish (US)
Title of host publicationProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Pages685-690
Number of pages6
DOIs
StatePublished - 2003
Event9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03 - Washington, DC, United States
Duration: Aug 24 2003Aug 27 2003

Other

Other9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03
CountryUnited States
CityWashington, DC
Period8/24/038/27/03

Fingerprint

Redundancy
Feature extraction
Data mining
Processing

Keywords

  • Feature selection
  • High-dimensional data
  • Redundancy

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Yu, L., & Liu, H. (2003). Efficiently handling feature redundancy in high-dimensional data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 685-690) https://doi.org/10.1145/956750.956840

Efficiently handling feature redundancy in high-dimensional data. / Yu, Lei; Liu, Huan.

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2003. p. 685-690.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yu, L & Liu, H 2003, Efficiently handling feature redundancy in high-dimensional data. in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 685-690, 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '03, Washington, DC, United States, 8/24/03. https://doi.org/10.1145/956750.956840
Yu L, Liu H. Efficiently handling feature redundancy in high-dimensional data. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2003. p. 685-690 https://doi.org/10.1145/956750.956840
Yu, Lei ; Liu, Huan. / Efficiently handling feature redundancy in high-dimensional data. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2003. pp. 685-690
@inproceedings{8796b241a0ba4c3ca24ec46cc515dc28,
title = "Efficiently handling feature redundancy in high-dimensional data",
abstract = "High-dimensional data poses a severe challenge for data mining. Feature selection is a frequently used technique in pre-processing high-dimensional data for successful data mining. Traditionally, feature selection is focused on removing irrelevant features. However, for high-dimensional data, removing redundant features is equally critical. In this paper, we provide a study of feature redundancy in high-dimensional data and propose a novel correlation-based approach to feature selection within the filter model. The extensive empirical study using real-world data shows that the proposed approach is efficient and effective in removing redundant and irrelevant features.",
keywords = "Feature selection, High-dimensional data, Redundancy",
author = "Lei Yu and Huan Liu",
year = "2003",
doi = "10.1145/956750.956840",
language = "English (US)",
pages = "685--690",
booktitle = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

}

TY - GEN

T1 - Efficiently handling feature redundancy in high-dimensional data

AU - Yu, Lei

AU - Liu, Huan

PY - 2003

Y1 - 2003

N2 - High-dimensional data poses a severe challenge for data mining. Feature selection is a frequently used technique in pre-processing high-dimensional data for successful data mining. Traditionally, feature selection is focused on removing irrelevant features. However, for high-dimensional data, removing redundant features is equally critical. In this paper, we provide a study of feature redundancy in high-dimensional data and propose a novel correlation-based approach to feature selection within the filter model. The extensive empirical study using real-world data shows that the proposed approach is efficient and effective in removing redundant and irrelevant features.

AB - High-dimensional data poses a severe challenge for data mining. Feature selection is a frequently used technique in pre-processing high-dimensional data for successful data mining. Traditionally, feature selection is focused on removing irrelevant features. However, for high-dimensional data, removing redundant features is equally critical. In this paper, we provide a study of feature redundancy in high-dimensional data and propose a novel correlation-based approach to feature selection within the filter model. The extensive empirical study using real-world data shows that the proposed approach is efficient and effective in removing redundant and irrelevant features.

KW - Feature selection

KW - High-dimensional data

KW - Redundancy

UR - http://www.scopus.com/inward/record.url?scp=12244249636&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=12244249636&partnerID=8YFLogxK

U2 - 10.1145/956750.956840

DO - 10.1145/956750.956840

M3 - Conference contribution

AN - SCOPUS:12244249636

SP - 685

EP - 690

BT - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ER -