Some issues on scalable feature selection

Huan Liu, Rudy Setiono

Research output: Contribution to journalArticlepeer-review

32 Scopus citations

Abstract

Feature selection determines relevant features in the data. It is often applied in pattern classification, data mining, as well as machine learning. A special concern for feature selection nowadays is that the size of a database is normally very large, both vertically and horizontally. In addition, feature sets may grow as the data collection process continues. Effective solutions are needed to accommodate the practical demands. This paper concentrates on three issues: large number of features, large data size, and expanding feature set. For the first issue, we suggest a probabilistic algorithm to select features. For the second issue, we present a scalable probabilistic algorithm that expedites feature selection further and can scale up without sacrificing the quality of selected features. For the third issue, we propose an incremental algorithm that adapts to the newly extended feature set and captures 'concept drifts' by removing features from previously selected and newly added ones. We expect that research on scalable feature selection will be extended to distributed and parallel computing and have impact on applications of data mining and machine learning.

Original languageEnglish (US)
Pages (from-to)333-339
Number of pages7
JournalExpert Systems With Applications
Volume15
Issue number3-4
StatePublished - Jan 1 1998
Externally publishedYes

Keywords

  • Features
  • Large databases
  • Pattern classification
  • Probabilistic selection
  • Scalability

ASJC Scopus subject areas

  • Engineering(all)
  • Computer Science Applications
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Some issues on scalable feature selection'. Together they form a unique fingerprint.

Cite this