Some issues on scalable feature selection

Huan Liu, Rudy Setiono

Research output: Contribution to journalArticle

31 Scopus citations

Abstract

Feature selection determines relevant features in the data. It is often applied in pattern classification, data mining, as well as machine learning. A special concern for feature selection nowadays is that the size of a database is normally very large, both vertically and horizontally. In addition, feature sets may grow as the data collection process continues. Effective solutions are needed to accommodate the practical demands. This paper concentrates on three issues: large number of features, large data size, and expanding feature set. For the first issue, we suggest a probabilistic algorithm to select features. For the second issue, we present a scalable probabilistic algorithm that expedites feature selection further and can scale up without sacrificing the quality of selected features. For the third issue, we propose an incremental algorithm that adapts to the newly extended feature set and captures 'concept drifts' by removing features from previously selected and newly added ones. We expect that research on scalable feature selection will be extended to distributed and parallel computing and have impact on applications of data mining and machine learning.

Original languageEnglish (US)
Pages (from-to)333-339
Number of pages7
JournalExpert Systems With Applications
Volume15
Issue number3-4
StatePublished - Jan 1 1998

    Fingerprint

Keywords

  • Features
  • Large databases
  • Pattern classification
  • Probabilistic selection
  • Scalability

ASJC Scopus subject areas

  • Engineering(all)
  • Computer Science Applications
  • Artificial Intelligence

Cite this