Redundancy based feature selection for microarray data

Lei Yu, Huan Liu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

147 Citations (Scopus)

Abstract

In gene expression microarray data analysis, selecting a small number of discriminative genes from thousands of genes is an important problem for accurate classification of diseases or phenotypes. The problem becomes particularly challenging due to the large number of features (genes) and small sample size. Traditional gene selection methods often select the top-ranked genes according to their individual discriminative power without handling the high degree of redundancy among the genes. Latest research shows that removing redundant genes among selected ones can achieve a better representation of the characteristics of the targeted phenotypes and lead to improved classification accuracy. Hence, we study in this paper the relationship between feature relevance and redundancy and propose an efficient method that can effectively remove redundant genes. The efficiency and effectiveness of our method in comparison with representative methods has been demonstrated through an empirical study using public microarray data sets.

Original languageEnglish (US)
Title of host publicationKDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
EditorsR. Kohavi, J. Gehrke, W. DuMouchel, J. Ghosh
Pages737-742
Number of pages6
StatePublished - 2004
EventKDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Seattle, WA, United States
Duration: Aug 22 2004Aug 25 2004

Other

OtherKDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
CountryUnited States
CitySeattle, WA
Period8/22/048/25/04

Fingerprint

Microarrays
Redundancy
Feature extraction
Genes
Gene expression

Keywords

  • Feature redundancy
  • Gene selection
  • Microarray data

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Yu, L., & Liu, H. (2004). Redundancy based feature selection for microarray data. In R. Kohavi, J. Gehrke, W. DuMouchel, & J. Ghosh (Eds.), KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 737-742)

Redundancy based feature selection for microarray data. / Yu, Lei; Liu, Huan.

KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ed. / R. Kohavi; J. Gehrke; W. DuMouchel; J. Ghosh. 2004. p. 737-742.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yu, L & Liu, H 2004, Redundancy based feature selection for microarray data. in R Kohavi, J Gehrke, W DuMouchel & J Ghosh (eds), KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 737-742, KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, United States, 8/22/04.
Yu L, Liu H. Redundancy based feature selection for microarray data. In Kohavi R, Gehrke J, DuMouchel W, Ghosh J, editors, KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004. p. 737-742
Yu, Lei ; Liu, Huan. / Redundancy based feature selection for microarray data. KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. editor / R. Kohavi ; J. Gehrke ; W. DuMouchel ; J. Ghosh. 2004. pp. 737-742
@inproceedings{080b2303b6234acd89cdb18e9d998124,
title = "Redundancy based feature selection for microarray data",
abstract = "In gene expression microarray data analysis, selecting a small number of discriminative genes from thousands of genes is an important problem for accurate classification of diseases or phenotypes. The problem becomes particularly challenging due to the large number of features (genes) and small sample size. Traditional gene selection methods often select the top-ranked genes according to their individual discriminative power without handling the high degree of redundancy among the genes. Latest research shows that removing redundant genes among selected ones can achieve a better representation of the characteristics of the targeted phenotypes and lead to improved classification accuracy. Hence, we study in this paper the relationship between feature relevance and redundancy and propose an efficient method that can effectively remove redundant genes. The efficiency and effectiveness of our method in comparison with representative methods has been demonstrated through an empirical study using public microarray data sets.",
keywords = "Feature redundancy, Gene selection, Microarray data",
author = "Lei Yu and Huan Liu",
year = "2004",
language = "English (US)",
isbn = "1581138881",
pages = "737--742",
editor = "R. Kohavi and J. Gehrke and W. DuMouchel and J. Ghosh",
booktitle = "KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

}

TY - GEN

T1 - Redundancy based feature selection for microarray data

AU - Yu, Lei

AU - Liu, Huan

PY - 2004

Y1 - 2004

N2 - In gene expression microarray data analysis, selecting a small number of discriminative genes from thousands of genes is an important problem for accurate classification of diseases or phenotypes. The problem becomes particularly challenging due to the large number of features (genes) and small sample size. Traditional gene selection methods often select the top-ranked genes according to their individual discriminative power without handling the high degree of redundancy among the genes. Latest research shows that removing redundant genes among selected ones can achieve a better representation of the characteristics of the targeted phenotypes and lead to improved classification accuracy. Hence, we study in this paper the relationship between feature relevance and redundancy and propose an efficient method that can effectively remove redundant genes. The efficiency and effectiveness of our method in comparison with representative methods has been demonstrated through an empirical study using public microarray data sets.

AB - In gene expression microarray data analysis, selecting a small number of discriminative genes from thousands of genes is an important problem for accurate classification of diseases or phenotypes. The problem becomes particularly challenging due to the large number of features (genes) and small sample size. Traditional gene selection methods often select the top-ranked genes according to their individual discriminative power without handling the high degree of redundancy among the genes. Latest research shows that removing redundant genes among selected ones can achieve a better representation of the characteristics of the targeted phenotypes and lead to improved classification accuracy. Hence, we study in this paper the relationship between feature relevance and redundancy and propose an efficient method that can effectively remove redundant genes. The efficiency and effectiveness of our method in comparison with representative methods has been demonstrated through an empirical study using public microarray data sets.

KW - Feature redundancy

KW - Gene selection

KW - Microarray data

UR - http://www.scopus.com/inward/record.url?scp=12244293653&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=12244293653&partnerID=8YFLogxK

M3 - Conference contribution

SN - 1581138881

SN - 9781581138887

SP - 737

EP - 742

BT - KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

A2 - Kohavi, R.

A2 - Gehrke, J.

A2 - DuMouchel, W.

A2 - Ghosh, J.

ER -