Consistency-based search in feature selection

Manoranjan Dash, Huan Liu

Research output: Contribution to journalArticle

624 Scopus citations

Abstract

Feature selection is an effective technique in dealing with dimensionality reduction. For classification, it is used to find an "optimal" subset of relevant features such that the overall accuracy of classification is increased while the data size is reduced and the comprehensibility is improved. Feature selection methods contain two important aspects: evaluation of a candidate feature subset and search through the feature space. Existing algorithms adopt various measures to evaluate the goodness of feature subsets. This work focuses on inconsistency measure according to which a feature subset is inconsistent if there exist at least two instances with same feature values but with different class labels. We compare inconsistency measure with other measures and study different search strategies such as exhaustive, complete, heuristic and random search, that can be applied to this measure. We conduct an empirical study to examine the pros and cons of these search methods, give some guidelines on choosing a search method, and compare the classifier error rates before and after feature selection.

Original languageEnglish (US)
Pages (from-to)155-176
Number of pages22
JournalArtificial Intelligence
Volume151
Issue number1-2
DOIs
StatePublished - Dec 1 2003

Keywords

  • Branch and bound
  • Classification
  • Evaluation measures
  • Feature selection
  • Random search
  • Search strategies

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Consistency-based search in feature selection'. Together they form a unique fingerprint.

  • Cite this