Abstract

Gene selection aims at detecting biologically relevant genes to assist biologists' research. The cDNA Microar-ray data used in gene selection is usually "wide". With more than several thousand genes, but only less than a hundred of samples, many biologically irrelevant genes can gain their statistical relevance by sheer randomness. Addressing this problem goes beyond what the cDNA Microarray can offer and necessitates the use of additional information. Recent developments in bioinfor-matics have made various knowledge sources available, such as the KEGG pathway repository and Gene Ontology database. Integrating different types of knowledge could provide more information about genes and samples. In this work, we propose a novel approach to integrate different types of knowledge for identifying biologically relevant genes. The approach converts different types of external knowledge to its internal knowledge, which can be used to rank genes. Upon obtaining the ranking lists, it aggregates them via a probabilistic model and generates a final list. Experimental results from our study on acute lymphoblastic leukemia demonstrate the efficacy of the proposed approach and show that using different types of knowledge together can help detect biologically relevant genes.

Original languageEnglish (US)
Title of host publicationProceedings of the 10th SIAM International Conference on Data Mining, SDM 2010
Pages838-849
Number of pages12
StatePublished - 2010
Event10th SIAM International Conference on Data Mining, SDM 2010 - Columbus, OH, United States
Duration: Apr 29 2010May 1 2010

Other

Other10th SIAM International Conference on Data Mining, SDM 2010
CountryUnited States
CityColumbus, OH
Period4/29/105/1/10

ASJC Scopus subject areas

  • Software

Fingerprint Dive into the research topics of 'An integrative approach to identifying biologically relevant genes'. Together they form a unique fingerprint.

  • Cite this

    Zhao, Z., Wang, J., Sharma, S., Agarwal, N., Liu, H., & Chang, Y. (2010). An integrative approach to identifying biologically relevant genes. In Proceedings of the 10th SIAM International Conference on Data Mining, SDM 2010 (pp. 838-849)