Consistency-based search in feature selection

Manoranjan Dash; Huan Liu

doi:10.1016/S0004-3702(03)00079-1

Consistency-based search in feature selection

Manoranjan Dash, Huan Liu

Research output: Contribution to journal › Article › peer-review

771 Scopus citations

Abstract

Feature selection is an effective technique in dealing with dimensionality reduction. For classification, it is used to find an "optimal" subset of relevant features such that the overall accuracy of classification is increased while the data size is reduced and the comprehensibility is improved. Feature selection methods contain two important aspects: evaluation of a candidate feature subset and search through the feature space. Existing algorithms adopt various measures to evaluate the goodness of feature subsets. This work focuses on inconsistency measure according to which a feature subset is inconsistent if there exist at least two instances with same feature values but with different class labels. We compare inconsistency measure with other measures and study different search strategies such as exhaustive, complete, heuristic and random search, that can be applied to this measure. We conduct an empirical study to examine the pros and cons of these search methods, give some guidelines on choosing a search method, and compare the classifier error rates before and after feature selection.

Original language	English (US)
Pages (from-to)	155-176
Number of pages	22
Journal	Artificial Intelligence
Volume	151
Issue number	1-2
DOIs	https://doi.org/10.1016/S0004-3702(03)00079-1
State	Published - Dec 2003

Keywords

Branch and bound
Classification
Evaluation measures
Feature selection
Random search
Search strategies

ASJC Scopus subject areas

Language and Linguistics
Linguistics and Language
Artificial Intelligence

Access to Document

10.1016/S0004-3702(03)00079-1

Cite this

@article{fc51c1483a77469988407b6eca4e9196,

title = "Consistency-based search in feature selection",

abstract = "Feature selection is an effective technique in dealing with dimensionality reduction. For classification, it is used to find an {"}optimal{"} subset of relevant features such that the overall accuracy of classification is increased while the data size is reduced and the comprehensibility is improved. Feature selection methods contain two important aspects: evaluation of a candidate feature subset and search through the feature space. Existing algorithms adopt various measures to evaluate the goodness of feature subsets. This work focuses on inconsistency measure according to which a feature subset is inconsistent if there exist at least two instances with same feature values but with different class labels. We compare inconsistency measure with other measures and study different search strategies such as exhaustive, complete, heuristic and random search, that can be applied to this measure. We conduct an empirical study to examine the pros and cons of these search methods, give some guidelines on choosing a search method, and compare the classifier error rates before and after feature selection.",

keywords = "Branch and bound, Classification, Evaluation measures, Feature selection, Random search, Search strategies",

author = "Manoranjan Dash and Huan Liu",

year = "2003",

month = dec,

doi = "10.1016/S0004-3702(03)00079-1",

language = "English (US)",

volume = "151",

pages = "155--176",

journal = "Artificial Intelligence",

issn = "0004-3702",

publisher = "Elsevier",

number = "1-2",

}

TY - JOUR

T1 - Consistency-based search in feature selection

AU - Dash, Manoranjan

AU - Liu, Huan

PY - 2003/12

Y1 - 2003/12

N2 - Feature selection is an effective technique in dealing with dimensionality reduction. For classification, it is used to find an "optimal" subset of relevant features such that the overall accuracy of classification is increased while the data size is reduced and the comprehensibility is improved. Feature selection methods contain two important aspects: evaluation of a candidate feature subset and search through the feature space. Existing algorithms adopt various measures to evaluate the goodness of feature subsets. This work focuses on inconsistency measure according to which a feature subset is inconsistent if there exist at least two instances with same feature values but with different class labels. We compare inconsistency measure with other measures and study different search strategies such as exhaustive, complete, heuristic and random search, that can be applied to this measure. We conduct an empirical study to examine the pros and cons of these search methods, give some guidelines on choosing a search method, and compare the classifier error rates before and after feature selection.

AB - Feature selection is an effective technique in dealing with dimensionality reduction. For classification, it is used to find an "optimal" subset of relevant features such that the overall accuracy of classification is increased while the data size is reduced and the comprehensibility is improved. Feature selection methods contain two important aspects: evaluation of a candidate feature subset and search through the feature space. Existing algorithms adopt various measures to evaluate the goodness of feature subsets. This work focuses on inconsistency measure according to which a feature subset is inconsistent if there exist at least two instances with same feature values but with different class labels. We compare inconsistency measure with other measures and study different search strategies such as exhaustive, complete, heuristic and random search, that can be applied to this measure. We conduct an empirical study to examine the pros and cons of these search methods, give some guidelines on choosing a search method, and compare the classifier error rates before and after feature selection.

KW - Branch and bound

KW - Classification

KW - Evaluation measures

KW - Feature selection

KW - Random search

KW - Search strategies

UR - http://www.scopus.com/inward/record.url?scp=0242302657&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0242302657&partnerID=8YFLogxK

U2 - 10.1016/S0004-3702(03)00079-1

DO - 10.1016/S0004-3702(03)00079-1

M3 - Article

AN - SCOPUS:0242302657

SN - 0004-3702

VL - 151

SP - 155

EP - 176

JO - Artificial Intelligence

JF - Artificial Intelligence

IS - 1-2

ER -

Consistency-based search in feature selection

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this