Active feature selection using classes

Huan Liu; Lei Yu; Manoranjan Dash; Hiroshi Motoda

doi:10.1007/3-540-36175-8_48

Active feature selection using classes

Huan Liu, Lei Yu, Manoranjan Dash, Hiroshi Motoda

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

16 Scopus citations

Abstract

Feature selection is frequently used in data pre-processing for data mining. When the training data set is too large, sampling is commonly used to overcome the difficulty. This work investigates the applicability of active sampling in feature selection in a filter model setting. Our objective is to partition data by taking advantage of class information so as to achieve the same or better performance for feature selection with fewer but more relevant instances than random sampling. Two versions of active feature selection that employ class information are proposed and empirically evaluated. In comparison with random sampling, we conduct extensive experiments with benchmark data sets, and analyze reasons why class-based active feature selection works in the way it does. The results will help us deal with large data sets and provide ideas to scale up other feature selection algorithms.

Original language	English (US)
Title of host publication	Advances in Knowledge Discovery and Data Mining
Editors	Kyu-Young Wang, Jongwoo Jeon, Kyuseok Shim, Jaideep Srivastava
Publisher	Springer Verlag
Pages	474-485
Number of pages	12
ISBN (Electronic)	3540047603, 9783540047605
DOIs	https://doi.org/10.1007/3-540-36175-8_48
State	Published - 2003
Event	7th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2003 - Seoul, Korea, Republic of Duration: Apr 30 2003 → May 2 2003

Publication series

Name	Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)
Volume	2637
ISSN (Print)	0302-9743

Conference

Conference	7th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2003
Country/Territory	Korea, Republic of
City	Seoul
Period	4/30/03 → 5/2/03

ASJC Scopus subject areas

Theoretical Computer Science
General Computer Science

Access to Document

10.1007/3-540-36175-8_48

Cite this

Active feature selection using classes. / Liu, Huan; Yu, Lei; Dash, Manoranjan et al.
Advances in Knowledge Discovery and Data Mining. ed. / Kyu-Young Wang; Jongwoo Jeon; Kyuseok Shim; Jaideep Srivastava. Springer Verlag, 2003. p. 474-485 (Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science); Vol. 2637).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Liu, H, Yu, L, Dash, M & Motoda, H 2003, Active feature selection using classes. in K-Y Wang, J Jeon, K Shim & J Srivastava (eds), Advances in Knowledge Discovery and Data Mining. Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science), vol. 2637, Springer Verlag, pp. 474-485, 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2003, Seoul, Korea, Republic of, 4/30/03. https://doi.org/10.1007/3-540-36175-8_48

@inproceedings{bceedd1b32954c2f9e1a249d1e1b7575,

title = "Active feature selection using classes",

abstract = "Feature selection is frequently used in data pre-processing for data mining. When the training data set is too large, sampling is commonly used to overcome the difficulty. This work investigates the applicability of active sampling in feature selection in a filter model setting. Our objective is to partition data by taking advantage of class information so as to achieve the same or better performance for feature selection with fewer but more relevant instances than random sampling. Two versions of active feature selection that employ class information are proposed and empirically evaluated. In comparison with random sampling, we conduct extensive experiments with benchmark data sets, and analyze reasons why class-based active feature selection works in the way it does. The results will help us deal with large data sets and provide ideas to scale up other feature selection algorithms.",

author = "Huan Liu and Lei Yu and Manoranjan Dash and Hiroshi Motoda",

note = "Publisher Copyright: {\textcopyright} Springer-Verlag Berlin Heidelberg 2003.; 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2003 ; Conference date: 30-04-2003 Through 02-05-2003",

year = "2003",

doi = "10.1007/3-540-36175-8_48",

language = "English (US)",

series = "Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)",

publisher = "Springer Verlag",

pages = "474--485",

editor = "Kyu-Young Wang and Jongwoo Jeon and Kyuseok Shim and Jaideep Srivastava",

booktitle = "Advances in Knowledge Discovery and Data Mining",

}

TY - GEN

T1 - Active feature selection using classes

AU - Liu, Huan

AU - Yu, Lei

AU - Dash, Manoranjan

AU - Motoda, Hiroshi

N1 - Publisher Copyright: © Springer-Verlag Berlin Heidelberg 2003.

PY - 2003

Y1 - 2003

N2 - Feature selection is frequently used in data pre-processing for data mining. When the training data set is too large, sampling is commonly used to overcome the difficulty. This work investigates the applicability of active sampling in feature selection in a filter model setting. Our objective is to partition data by taking advantage of class information so as to achieve the same or better performance for feature selection with fewer but more relevant instances than random sampling. Two versions of active feature selection that employ class information are proposed and empirically evaluated. In comparison with random sampling, we conduct extensive experiments with benchmark data sets, and analyze reasons why class-based active feature selection works in the way it does. The results will help us deal with large data sets and provide ideas to scale up other feature selection algorithms.

AB - Feature selection is frequently used in data pre-processing for data mining. When the training data set is too large, sampling is commonly used to overcome the difficulty. This work investigates the applicability of active sampling in feature selection in a filter model setting. Our objective is to partition data by taking advantage of class information so as to achieve the same or better performance for feature selection with fewer but more relevant instances than random sampling. Two versions of active feature selection that employ class information are proposed and empirically evaluated. In comparison with random sampling, we conduct extensive experiments with benchmark data sets, and analyze reasons why class-based active feature selection works in the way it does. The results will help us deal with large data sets and provide ideas to scale up other feature selection algorithms.

UR - http://www.scopus.com/inward/record.url?scp=7444252802&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=7444252802&partnerID=8YFLogxK

U2 - 10.1007/3-540-36175-8_48

DO - 10.1007/3-540-36175-8_48

M3 - Conference contribution

AN - SCOPUS:7444252802

T3 - Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)

SP - 474

EP - 485

BT - Advances in Knowledge Discovery and Data Mining

A2 - Wang, Kyu-Young

A2 - Jeon, Jongwoo

A2 - Shim, Kyuseok

A2 - Srivastava, Jaideep

PB - Springer Verlag

T2 - 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2003

Y2 - 30 April 2003 through 2 May 2003

ER -

Active feature selection using classes

Abstract

Publication series

Conference

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this