TY - JOUR
T1 - A selective sampling approach to active feature selection
AU - Liu, Huan
AU - Motoda, Hiroshi
AU - Yu, Lei
N1 - Funding Information:
We thank Bret Ehlert, Feifang Hu, Manoranjan Dash, Hongjun Lu, and Lance Parsons for their contributions to this work. We are grateful to the anonymous reviewers who have provided many helpful and constructive suggestions on an earlier version of this paper. An earlier short version of this work was published in the proceedings of the 19th International Conference on Machine learning, 2002 [39]. This work is in part based on the project supported by National Science Foundation under Grant No. IIS-0127815 for H. Liu, and on Grant-in-Aid for Scientific Research on Priority Areas (B), No. 759: Active Mining Project by Ministry of Education, Culture, Sports, Science and Technology of Japan for H. Motoda.
PY - 2004/11
Y1 - 2004/11
N2 - Feature selection, as a preprocessing step to machine learning, has been very effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving result comprehensibility. Traditional feature selection methods resort to random sampling in dealing with data sets with a huge number of instances. In this paper, we introduce the concept of active feature selection, and investigate a selective sampling approach to active feature selection in a filter model setting. We present a formalism of selective sampling based on data variance, and apply it to a widely used feature selection algorithm Relief. Further, we show how it realizes active feature selection and reduces the required number of training instances to achieve time savings without performance deterioration. We design objective evaluation measures of performance, conduct extensive experiments using both synthetic and benchmark data sets, and observe consistent and significant improvement. We suggest some further work based on our study and experiments.
AB - Feature selection, as a preprocessing step to machine learning, has been very effective in reducing dimensionality, removing irrelevant data, increasing learning accuracy, and improving result comprehensibility. Traditional feature selection methods resort to random sampling in dealing with data sets with a huge number of instances. In this paper, we introduce the concept of active feature selection, and investigate a selective sampling approach to active feature selection in a filter model setting. We present a formalism of selective sampling based on data variance, and apply it to a widely used feature selection algorithm Relief. Further, we show how it realizes active feature selection and reduces the required number of training instances to achieve time savings without performance deterioration. We design objective evaluation measures of performance, conduct extensive experiments using both synthetic and benchmark data sets, and observe consistent and significant improvement. We suggest some further work based on our study and experiments.
KW - Dimensionality reduction
KW - Feature selection and ranking
KW - Learning
KW - Sampling
UR - http://www.scopus.com/inward/record.url?scp=4644347255&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=4644347255&partnerID=8YFLogxK
U2 - 10.1016/j.artint.2004.05.009
DO - 10.1016/j.artint.2004.05.009
M3 - Article
AN - SCOPUS:4644347255
SN - 0004-3702
VL - 159
SP - 49
EP - 74
JO - Artificial Intelligence
JF - Artificial Intelligence
IS - 1-2
ER -