Active feature selection using classes

Huan Liu, Lei Yu, Manoranjan Dash, Hiroshi Motoda

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Citations (Scopus)

Abstract

Feature selection is frequently used in data pre-processing for data mining. When the training data set is too large, sampling is commonly used to overcome the difficulty. This work investigates the applicability of active sampling in feature selection in a filter model setting. Our objective is to partition data by taking advantage of class information so as to achieve the same or better performance for feature selection with fewer but more relevant instances than random sampling. Two versions of active feature selection that employ class information are proposed and empirically evaluated. In comparison with random sampling, we conduct extensive experiments with benchmark data sets, and analyze reasons why class-based active feature selection works in the way it does. The results will help us deal with large data sets and provide ideas to scale up other feature selection algorithms.

Original languageEnglish (US)
Title of host publicationLecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)
EditorsK.-Y. Whang, J. Jeon, K. Shim, J. Srivastava
Pages474-485
Number of pages12
Volume2637
StatePublished - 2003
Event7th Pacific-Asia Conference, PAKDD 2003 - Seoul, Korea, Republic of
Duration: Apr 30 2003May 2 2003

Other

Other7th Pacific-Asia Conference, PAKDD 2003
CountryKorea, Republic of
CitySeoul
Period4/30/035/2/03

Fingerprint

Feature extraction
Sampling
Data mining
Processing
Experiments

ASJC Scopus subject areas

  • Hardware and Architecture

Cite this

Liu, H., Yu, L., Dash, M., & Motoda, H. (2003). Active feature selection using classes. In K-Y. Whang, J. Jeon, K. Shim, & J. Srivastava (Eds.), Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) (Vol. 2637, pp. 474-485)

Active feature selection using classes. / Liu, Huan; Yu, Lei; Dash, Manoranjan; Motoda, Hiroshi.

Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science). ed. / K.-Y. Whang; J. Jeon; K. Shim; J. Srivastava. Vol. 2637 2003. p. 474-485.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Liu, H, Yu, L, Dash, M & Motoda, H 2003, Active feature selection using classes. in K-Y Whang, J Jeon, K Shim & J Srivastava (eds), Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science). vol. 2637, pp. 474-485, 7th Pacific-Asia Conference, PAKDD 2003, Seoul, Korea, Republic of, 4/30/03.
Liu H, Yu L, Dash M, Motoda H. Active feature selection using classes. In Whang K-Y, Jeon J, Shim K, Srivastava J, editors, Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science). Vol. 2637. 2003. p. 474-485
Liu, Huan ; Yu, Lei ; Dash, Manoranjan ; Motoda, Hiroshi. / Active feature selection using classes. Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science). editor / K.-Y. Whang ; J. Jeon ; K. Shim ; J. Srivastava. Vol. 2637 2003. pp. 474-485
@inproceedings{bceedd1b32954c2f9e1a249d1e1b7575,
title = "Active feature selection using classes",
abstract = "Feature selection is frequently used in data pre-processing for data mining. When the training data set is too large, sampling is commonly used to overcome the difficulty. This work investigates the applicability of active sampling in feature selection in a filter model setting. Our objective is to partition data by taking advantage of class information so as to achieve the same or better performance for feature selection with fewer but more relevant instances than random sampling. Two versions of active feature selection that employ class information are proposed and empirically evaluated. In comparison with random sampling, we conduct extensive experiments with benchmark data sets, and analyze reasons why class-based active feature selection works in the way it does. The results will help us deal with large data sets and provide ideas to scale up other feature selection algorithms.",
author = "Huan Liu and Lei Yu and Manoranjan Dash and Hiroshi Motoda",
year = "2003",
language = "English (US)",
volume = "2637",
pages = "474--485",
editor = "K.-Y. Whang and J. Jeon and K. Shim and J. Srivastava",
booktitle = "Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)",

}

TY - GEN

T1 - Active feature selection using classes

AU - Liu, Huan

AU - Yu, Lei

AU - Dash, Manoranjan

AU - Motoda, Hiroshi

PY - 2003

Y1 - 2003

N2 - Feature selection is frequently used in data pre-processing for data mining. When the training data set is too large, sampling is commonly used to overcome the difficulty. This work investigates the applicability of active sampling in feature selection in a filter model setting. Our objective is to partition data by taking advantage of class information so as to achieve the same or better performance for feature selection with fewer but more relevant instances than random sampling. Two versions of active feature selection that employ class information are proposed and empirically evaluated. In comparison with random sampling, we conduct extensive experiments with benchmark data sets, and analyze reasons why class-based active feature selection works in the way it does. The results will help us deal with large data sets and provide ideas to scale up other feature selection algorithms.

AB - Feature selection is frequently used in data pre-processing for data mining. When the training data set is too large, sampling is commonly used to overcome the difficulty. This work investigates the applicability of active sampling in feature selection in a filter model setting. Our objective is to partition data by taking advantage of class information so as to achieve the same or better performance for feature selection with fewer but more relevant instances than random sampling. Two versions of active feature selection that employ class information are proposed and empirically evaluated. In comparison with random sampling, we conduct extensive experiments with benchmark data sets, and analyze reasons why class-based active feature selection works in the way it does. The results will help us deal with large data sets and provide ideas to scale up other feature selection algorithms.

UR - http://www.scopus.com/inward/record.url?scp=7444252802&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=7444252802&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:7444252802

VL - 2637

SP - 474

EP - 485

BT - Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)

A2 - Whang, K.-Y.

A2 - Jeon, J.

A2 - Shim, K.

A2 - Srivastava, J.

ER -