Abstract

Gene selection aims at detecting biologically relevant genes to assist biologists' research. The cDNA Microarray data used in gene selection is usually "wide". With more than ten thousand genes, but only less than a hundred of samples, many biologically irrelevant genes can gain their statistical relevance by sheer randomness. Moreover, even for genes that are biologically relevant, biologists often prefer the "trigger" to the "fire". Addressing these problems goes beyond what the cDNA Microarray can offer and necessitates the use of additional information. Recent developments in bioinformatics have made various knowledge sources available, such as the KEGG pathway repository and Gene Ontology database. Integrating different types of knowledge for gene selection could provide more information about genes and samples. In this work, we propose a novel framework to integrate different types of knowledge for identifying biologically relevant genes. The framework converts different types of external knowledge to its internal knowledge, which can be used to rank genes. Upon obtaining the ranking lists, it aggregates them via a probabilistic model and generates a final ranking list. Experimental results from our study on acute lymphoblastic leukemia demonstrate the novelty and efficacy of the proposed framework and show that using different types of knowledge together can help detect biologically relevant genes.

Original languageEnglish (US)
Title of host publicationICDM Workshops 2009 - IEEE International Conference on Data Mining
Pages88-93
Number of pages6
DOIs
StatePublished - 2009
Event2009 IEEE International Conference on Data Mining Workshops, ICDMW 2009 - Miami, FL, United States
Duration: Dec 6 2009Dec 6 2009

Other

Other2009 IEEE International Conference on Data Mining Workshops, ICDMW 2009
CountryUnited States
CityMiami, FL
Period12/6/0912/6/09

Fingerprint

Genes
Microarrays
Bioinformatics
Ontology
Fires

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Vision and Pattern Recognition
  • Software

Cite this

Zhao, Z., Sharma, S., Agarwal, N., Liu, H., Wang, J., & Chang, Y. (2009). Integrating knowledge in search of biologically relevant genes. In ICDM Workshops 2009 - IEEE International Conference on Data Mining (pp. 88-93). [5360522] https://doi.org/10.1109/ICDMW.2009.21

Integrating knowledge in search of biologically relevant genes. / Zhao, Zheng; Sharma, Shashvata; Agarwal, Nitin; Liu, Huan; Wang, Jiangxin; Chang, Yung.

ICDM Workshops 2009 - IEEE International Conference on Data Mining. 2009. p. 88-93 5360522.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhao, Z, Sharma, S, Agarwal, N, Liu, H, Wang, J & Chang, Y 2009, Integrating knowledge in search of biologically relevant genes. in ICDM Workshops 2009 - IEEE International Conference on Data Mining., 5360522, pp. 88-93, 2009 IEEE International Conference on Data Mining Workshops, ICDMW 2009, Miami, FL, United States, 12/6/09. https://doi.org/10.1109/ICDMW.2009.21
Zhao Z, Sharma S, Agarwal N, Liu H, Wang J, Chang Y. Integrating knowledge in search of biologically relevant genes. In ICDM Workshops 2009 - IEEE International Conference on Data Mining. 2009. p. 88-93. 5360522 https://doi.org/10.1109/ICDMW.2009.21
Zhao, Zheng ; Sharma, Shashvata ; Agarwal, Nitin ; Liu, Huan ; Wang, Jiangxin ; Chang, Yung. / Integrating knowledge in search of biologically relevant genes. ICDM Workshops 2009 - IEEE International Conference on Data Mining. 2009. pp. 88-93
@inproceedings{3683fe17231f4b60bd2d313e1814c8ac,
title = "Integrating knowledge in search of biologically relevant genes",
abstract = "Gene selection aims at detecting biologically relevant genes to assist biologists' research. The cDNA Microarray data used in gene selection is usually {"}wide{"}. With more than ten thousand genes, but only less than a hundred of samples, many biologically irrelevant genes can gain their statistical relevance by sheer randomness. Moreover, even for genes that are biologically relevant, biologists often prefer the {"}trigger{"} to the {"}fire{"}. Addressing these problems goes beyond what the cDNA Microarray can offer and necessitates the use of additional information. Recent developments in bioinformatics have made various knowledge sources available, such as the KEGG pathway repository and Gene Ontology database. Integrating different types of knowledge for gene selection could provide more information about genes and samples. In this work, we propose a novel framework to integrate different types of knowledge for identifying biologically relevant genes. The framework converts different types of external knowledge to its internal knowledge, which can be used to rank genes. Upon obtaining the ranking lists, it aggregates them via a probabilistic model and generates a final ranking list. Experimental results from our study on acute lymphoblastic leukemia demonstrate the novelty and efficacy of the proposed framework and show that using different types of knowledge together can help detect biologically relevant genes.",
author = "Zheng Zhao and Shashvata Sharma and Nitin Agarwal and Huan Liu and Jiangxin Wang and Yung Chang",
year = "2009",
doi = "10.1109/ICDMW.2009.21",
language = "English (US)",
isbn = "9780769539027",
pages = "88--93",
booktitle = "ICDM Workshops 2009 - IEEE International Conference on Data Mining",

}

TY - GEN

T1 - Integrating knowledge in search of biologically relevant genes

AU - Zhao, Zheng

AU - Sharma, Shashvata

AU - Agarwal, Nitin

AU - Liu, Huan

AU - Wang, Jiangxin

AU - Chang, Yung

PY - 2009

Y1 - 2009

N2 - Gene selection aims at detecting biologically relevant genes to assist biologists' research. The cDNA Microarray data used in gene selection is usually "wide". With more than ten thousand genes, but only less than a hundred of samples, many biologically irrelevant genes can gain their statistical relevance by sheer randomness. Moreover, even for genes that are biologically relevant, biologists often prefer the "trigger" to the "fire". Addressing these problems goes beyond what the cDNA Microarray can offer and necessitates the use of additional information. Recent developments in bioinformatics have made various knowledge sources available, such as the KEGG pathway repository and Gene Ontology database. Integrating different types of knowledge for gene selection could provide more information about genes and samples. In this work, we propose a novel framework to integrate different types of knowledge for identifying biologically relevant genes. The framework converts different types of external knowledge to its internal knowledge, which can be used to rank genes. Upon obtaining the ranking lists, it aggregates them via a probabilistic model and generates a final ranking list. Experimental results from our study on acute lymphoblastic leukemia demonstrate the novelty and efficacy of the proposed framework and show that using different types of knowledge together can help detect biologically relevant genes.

AB - Gene selection aims at detecting biologically relevant genes to assist biologists' research. The cDNA Microarray data used in gene selection is usually "wide". With more than ten thousand genes, but only less than a hundred of samples, many biologically irrelevant genes can gain their statistical relevance by sheer randomness. Moreover, even for genes that are biologically relevant, biologists often prefer the "trigger" to the "fire". Addressing these problems goes beyond what the cDNA Microarray can offer and necessitates the use of additional information. Recent developments in bioinformatics have made various knowledge sources available, such as the KEGG pathway repository and Gene Ontology database. Integrating different types of knowledge for gene selection could provide more information about genes and samples. In this work, we propose a novel framework to integrate different types of knowledge for identifying biologically relevant genes. The framework converts different types of external knowledge to its internal knowledge, which can be used to rank genes. Upon obtaining the ranking lists, it aggregates them via a probabilistic model and generates a final ranking list. Experimental results from our study on acute lymphoblastic leukemia demonstrate the novelty and efficacy of the proposed framework and show that using different types of knowledge together can help detect biologically relevant genes.

UR - http://www.scopus.com/inward/record.url?scp=77951156345&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77951156345&partnerID=8YFLogxK

U2 - 10.1109/ICDMW.2009.21

DO - 10.1109/ICDMW.2009.21

M3 - Conference contribution

AN - SCOPUS:77951156345

SN - 9780769539027

SP - 88

EP - 93

BT - ICDM Workshops 2009 - IEEE International Conference on Data Mining

ER -