Redundancy based feature selection for microarray data

Lei Yu; Huan Liu

doi:10.1145/1014052.1014149

Redundancy based feature selection for microarray data

Lei Yu, Huan Liu

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

178 Scopus citations

Abstract

In gene expression microarray data analysis, selecting a small number of discriminative genes from thousands of genes is an important problem for accurate classification of diseases or phenotypes. The problem becomes particularly challenging due to the large number of features (genes) and small sample size. Traditional gene selection methods often select the top-ranked genes according to their individual discriminative power without handling the high degree of redundancy among the genes. Latest research shows that removing redundant genes among selected ones can achieve a better representation of the characteristics of the targeted phenotypes and lead to improved classification accuracy. Hence, we study in this paper the relationship between feature relevance and redundancy and propose an efficient method that can effectively remove redundant genes. The efficiency and effectiveness of our method in comparison with representative methods has been demonstrated through an empirical study using public microarray data sets.

Original language	English (US)
Title of host publication	KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Publisher	Association for Computing Machinery
Pages	737-742
Number of pages	6
ISBN (Print)	1581138881, 9781581138887
DOIs	https://doi.org/10.1145/1014052.1014149
State	Published - 2004
Event	KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - Seattle, WA, United States Duration: Aug 22 2004 → Aug 25 2004

Publication series

Name	KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Other

Other	KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Country/Territory	United States
City	Seattle, WA
Period	8/22/04 → 8/25/04

Keywords

Feature redundancy
Gene selection
Microarray data

ASJC Scopus subject areas

General Engineering

Access to Document

10.1145/1014052.1014149

Cite this

Yu, L., & Liu, H. (2004). Redundancy based feature selection for microarray data. In KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 737-742). (KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). Association for Computing Machinery. https://doi.org/10.1145/1014052.1014149

Redundancy based feature selection for microarray data. / Yu, Lei; Liu, Huan.
KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2004. p. 737-742 (KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Yu, L & Liu, H 2004, Redundancy based feature selection for microarray data. in KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, pp. 737-742, KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, United States, 8/22/04. https://doi.org/10.1145/1014052.1014149

Yu L, Liu H. Redundancy based feature selection for microarray data. In KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery. 2004. p. 737-742. (KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). doi: 10.1145/1014052.1014149

@inproceedings{080b2303b6234acd89cdb18e9d998124,

title = "Redundancy based feature selection for microarray data",

abstract = "In gene expression microarray data analysis, selecting a small number of discriminative genes from thousands of genes is an important problem for accurate classification of diseases or phenotypes. The problem becomes particularly challenging due to the large number of features (genes) and small sample size. Traditional gene selection methods often select the top-ranked genes according to their individual discriminative power without handling the high degree of redundancy among the genes. Latest research shows that removing redundant genes among selected ones can achieve a better representation of the characteristics of the targeted phenotypes and lead to improved classification accuracy. Hence, we study in this paper the relationship between feature relevance and redundancy and propose an efficient method that can effectively remove redundant genes. The efficiency and effectiveness of our method in comparison with representative methods has been demonstrated through an empirical study using public microarray data sets.",

keywords = "Feature redundancy, Gene selection, Microarray data",

author = "Lei Yu and Huan Liu",

year = "2004",

doi = "10.1145/1014052.1014149",

language = "English (US)",

isbn = "1581138881",

series = "KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

publisher = "Association for Computing Machinery",

pages = "737--742",

booktitle = "KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

note = "KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining ; Conference date: 22-08-2004 Through 25-08-2004",

}

TY - GEN

T1 - Redundancy based feature selection for microarray data

AU - Yu, Lei

AU - Liu, Huan

PY - 2004

Y1 - 2004

N2 - In gene expression microarray data analysis, selecting a small number of discriminative genes from thousands of genes is an important problem for accurate classification of diseases or phenotypes. The problem becomes particularly challenging due to the large number of features (genes) and small sample size. Traditional gene selection methods often select the top-ranked genes according to their individual discriminative power without handling the high degree of redundancy among the genes. Latest research shows that removing redundant genes among selected ones can achieve a better representation of the characteristics of the targeted phenotypes and lead to improved classification accuracy. Hence, we study in this paper the relationship between feature relevance and redundancy and propose an efficient method that can effectively remove redundant genes. The efficiency and effectiveness of our method in comparison with representative methods has been demonstrated through an empirical study using public microarray data sets.

AB - In gene expression microarray data analysis, selecting a small number of discriminative genes from thousands of genes is an important problem for accurate classification of diseases or phenotypes. The problem becomes particularly challenging due to the large number of features (genes) and small sample size. Traditional gene selection methods often select the top-ranked genes according to their individual discriminative power without handling the high degree of redundancy among the genes. Latest research shows that removing redundant genes among selected ones can achieve a better representation of the characteristics of the targeted phenotypes and lead to improved classification accuracy. Hence, we study in this paper the relationship between feature relevance and redundancy and propose an efficient method that can effectively remove redundant genes. The efficiency and effectiveness of our method in comparison with representative methods has been demonstrated through an empirical study using public microarray data sets.

KW - Feature redundancy

KW - Gene selection

KW - Microarray data

UR - http://www.scopus.com/inward/record.url?scp=12244293653&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=12244293653&partnerID=8YFLogxK

U2 - 10.1145/1014052.1014149

DO - 10.1145/1014052.1014149

M3 - Conference contribution

AN - SCOPUS:12244293653

SN - 1581138881

SN - 9781581138887

T3 - KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

SP - 737

EP - 742

BT - KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

PB - Association for Computing Machinery

T2 - KDD-2004 - Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Y2 - 22 August 2004 through 25 August 2004

ER -

Redundancy based feature selection for microarray data

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this