Fuzzy c-means clustering with prior biological knowledge

Luis Tari; Chitta Baral; Seungchan Kim

doi:10.1016/j.jbi.2008.05.009

Fuzzy c-means clustering with prior biological knowledge

Luis Tari, Chitta Baral, Seungchan Kim

Research output: Contribution to journal › Article › peer-review

80 Scopus citations

Abstract

We propose a novel semi-supervised clustering method called GO Fuzzy c-means, which enables the simultaneous use of biological knowledge and gene expression data in a probabilistic clustering algorithm. Our method is based on the fuzzy c-means clustering algorithm and utilizes the Gene Ontology annotations as prior knowledge to guide the process of grouping functionally related genes. Unlike traditional clustering methods, our method is capable of assigning genes to multiple clusters, which is a more appropriate representation of the behavior of genes. Two datasets of yeast (Saccharomyces cerevisiae) expression profiles were applied to compare our method with other state-of-the-art clustering methods. Our experiments show that our method can produce far better biologically meaningful clusters even with the use of a small percentage of Gene Ontology annotations. In addition, our experiments further indicate that the utilization of prior knowledge in our method can predict gene functions effectively. The source code is freely available at http://sysbio.fulton.asu.edu/gofuzzy/.

Original language	English (US)
Pages (from-to)	74-81
Number of pages	8
Journal	Journal of Biomedical Informatics
Volume	42
Issue number	1
DOIs	https://doi.org/10.1016/j.jbi.2008.05.009
State	Published - Feb 2009

Keywords

Fuzzy c-means clustering
Gene Ontology
Gene expression data
Gene function prediction
Saccharomyces cerevisiae yeast
Semi-supervised clustering

ASJC Scopus subject areas

Health Informatics
Computer Science Applications

Access to Document

10.1016/j.jbi.2008.05.009

Cite this

@article{dcaa1dcc20424afd978c117961fe3a2d,

title = "Fuzzy c-means clustering with prior biological knowledge",

abstract = "We propose a novel semi-supervised clustering method called GO Fuzzy c-means, which enables the simultaneous use of biological knowledge and gene expression data in a probabilistic clustering algorithm. Our method is based on the fuzzy c-means clustering algorithm and utilizes the Gene Ontology annotations as prior knowledge to guide the process of grouping functionally related genes. Unlike traditional clustering methods, our method is capable of assigning genes to multiple clusters, which is a more appropriate representation of the behavior of genes. Two datasets of yeast (Saccharomyces cerevisiae) expression profiles were applied to compare our method with other state-of-the-art clustering methods. Our experiments show that our method can produce far better biologically meaningful clusters even with the use of a small percentage of Gene Ontology annotations. In addition, our experiments further indicate that the utilization of prior knowledge in our method can predict gene functions effectively. The source code is freely available at http://sysbio.fulton.asu.edu/gofuzzy/.",

keywords = "Fuzzy c-means clustering, Gene Ontology, Gene expression data, Gene function prediction, Saccharomyces cerevisiae yeast, Semi-supervised clustering",

author = "Luis Tari and Chitta Baral and Seungchan Kim",

note = "Funding Information: The authors appreciate insightful comments and editorial helps from Dr. Michael Bittner at the Translational Genomics Research Institute ( http://www.tgen.org ), Phoenix, AZ 85005. The authors would also like to thank the valuable comments by the anonymous reviewers. SK was partially funded by P01-CA27502-23 (NIH/NCI), P01 CA109552-01A1 (NIH/NCI), U19 AI067773 (NIH/NIAID), and W81XWH-06-1-090 (DoD/CDMRP).",

year = "2009",

month = feb,

doi = "10.1016/j.jbi.2008.05.009",

language = "English (US)",

volume = "42",

pages = "74--81",

journal = "Journal of Biomedical Informatics",

issn = "1532-0464",

publisher = "Academic Press Inc.",

number = "1",

}

TY - JOUR

T1 - Fuzzy c-means clustering with prior biological knowledge

AU - Tari, Luis

AU - Baral, Chitta

AU - Kim, Seungchan

N1 - Funding Information: The authors appreciate insightful comments and editorial helps from Dr. Michael Bittner at the Translational Genomics Research Institute ( http://www.tgen.org ), Phoenix, AZ 85005. The authors would also like to thank the valuable comments by the anonymous reviewers. SK was partially funded by P01-CA27502-23 (NIH/NCI), P01 CA109552-01A1 (NIH/NCI), U19 AI067773 (NIH/NIAID), and W81XWH-06-1-090 (DoD/CDMRP).

PY - 2009/2

Y1 - 2009/2

N2 - We propose a novel semi-supervised clustering method called GO Fuzzy c-means, which enables the simultaneous use of biological knowledge and gene expression data in a probabilistic clustering algorithm. Our method is based on the fuzzy c-means clustering algorithm and utilizes the Gene Ontology annotations as prior knowledge to guide the process of grouping functionally related genes. Unlike traditional clustering methods, our method is capable of assigning genes to multiple clusters, which is a more appropriate representation of the behavior of genes. Two datasets of yeast (Saccharomyces cerevisiae) expression profiles were applied to compare our method with other state-of-the-art clustering methods. Our experiments show that our method can produce far better biologically meaningful clusters even with the use of a small percentage of Gene Ontology annotations. In addition, our experiments further indicate that the utilization of prior knowledge in our method can predict gene functions effectively. The source code is freely available at http://sysbio.fulton.asu.edu/gofuzzy/.

AB - We propose a novel semi-supervised clustering method called GO Fuzzy c-means, which enables the simultaneous use of biological knowledge and gene expression data in a probabilistic clustering algorithm. Our method is based on the fuzzy c-means clustering algorithm and utilizes the Gene Ontology annotations as prior knowledge to guide the process of grouping functionally related genes. Unlike traditional clustering methods, our method is capable of assigning genes to multiple clusters, which is a more appropriate representation of the behavior of genes. Two datasets of yeast (Saccharomyces cerevisiae) expression profiles were applied to compare our method with other state-of-the-art clustering methods. Our experiments show that our method can produce far better biologically meaningful clusters even with the use of a small percentage of Gene Ontology annotations. In addition, our experiments further indicate that the utilization of prior knowledge in our method can predict gene functions effectively. The source code is freely available at http://sysbio.fulton.asu.edu/gofuzzy/.

KW - Fuzzy c-means clustering

KW - Gene Ontology

KW - Gene expression data

KW - Gene function prediction

KW - Saccharomyces cerevisiae yeast

KW - Semi-supervised clustering

UR - http://www.scopus.com/inward/record.url?scp=60049085522&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=60049085522&partnerID=8YFLogxK

U2 - 10.1016/j.jbi.2008.05.009

DO - 10.1016/j.jbi.2008.05.009

M3 - Article

C2 - 18595779

AN - SCOPUS:60049085522

SN - 1532-0464

VL - 42

SP - 74

EP - 81

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

IS - 1

ER -

Fuzzy c-means clustering with prior biological knowledge

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this