Bicluster Sampled Coherence Metric (BSCM) provides an accurate environmental context for phenotype predictions

Samuel A. Danziger, David J. Reiss, Alexander V. Ratushny, Jennifer J. Smith, Christopher Plaisier, John D. Aitchison, Nitin S. Baliga

Research output: Contribution to journalArticle

Abstract

Background: Biclustering is a popular method for identifying under which experimental conditions biological signatures are co-expressed. However, the general biclustering problem is NP-hard, offering room to focus algorithms on specific biological tasks. We hypothesize that conditional co-regulation of genes is a key factor in determining cell phenotype and that accurately segregating conditions in biclusters will improve such predictions. Thus, we developed a bicluster sampled coherence metric (BSCM) for determining which conditions and signals should be included in a bicluster. Results: Our BSCM calculates condition and cluster size specific p-values, and we incorporated these into the popular integrated biclustering algorithm cMonkey. We demonstrate that incorporation of our new algorithm significantly improves bicluster co-regulation scores (p-value = 0.009) and GO annotation scores (p-value = 0.004). Additionally, we used a bicluster based signal to predict whether a given experimental condition will result in yeast peroxisome induction. Using the new algorithm, the classifier accuracy improves from 41.9% to 76.1% correct. Conclusions: We demonstrate that the proposed BSCM helps determine which signals ought to be co-clustered, resulting in more accurately assigned bicluster membership. Furthermore, we show that BSCM can be extended to more accurately detect under which experimental conditions the genes are co-clustered. Features derived from this more accurate analysis of conditional regulation results in a dramatic improvement in the ability to predict a cellular phenotype in yeast. The latest cMonkey is available for download at https://github.com/baliga-lab/cmonkey2. The experimental data and source code featured in this paper is available http://AitchisonLab.com/BSCM. BSCM has been incorporated in the official cMonkey release.

Original languageEnglish (US)
Article numberS1
JournalBMC Systems Biology
Volume9
Issue number2
DOIs
StatePublished - Apr 15 2015
Externally publishedYes

Fingerprint

Phenotype
Biclustering
Metric
Prediction
p-Value
Yeasts
Yeast
Peroxisomes
Information Storage and Retrieval
Genes
Gene
Predict
Demonstrate
Annotation
Context
Computational complexity
Proof by induction
Classifiers
Signature
NP-complete problem

ASJC Scopus subject areas

  • Structural Biology
  • Modeling and Simulation
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

Bicluster Sampled Coherence Metric (BSCM) provides an accurate environmental context for phenotype predictions. / Danziger, Samuel A.; Reiss, David J.; Ratushny, Alexander V.; Smith, Jennifer J.; Plaisier, Christopher; Aitchison, John D.; Baliga, Nitin S.

In: BMC Systems Biology, Vol. 9, No. 2, S1, 15.04.2015.

Research output: Contribution to journalArticle

Danziger, Samuel A. ; Reiss, David J. ; Ratushny, Alexander V. ; Smith, Jennifer J. ; Plaisier, Christopher ; Aitchison, John D. ; Baliga, Nitin S. / Bicluster Sampled Coherence Metric (BSCM) provides an accurate environmental context for phenotype predictions. In: BMC Systems Biology. 2015 ; Vol. 9, No. 2.
@article{ebb1667bc2a54d739e50d5cfff80051b,
title = "Bicluster Sampled Coherence Metric (BSCM) provides an accurate environmental context for phenotype predictions",
abstract = "Background: Biclustering is a popular method for identifying under which experimental conditions biological signatures are co-expressed. However, the general biclustering problem is NP-hard, offering room to focus algorithms on specific biological tasks. We hypothesize that conditional co-regulation of genes is a key factor in determining cell phenotype and that accurately segregating conditions in biclusters will improve such predictions. Thus, we developed a bicluster sampled coherence metric (BSCM) for determining which conditions and signals should be included in a bicluster. Results: Our BSCM calculates condition and cluster size specific p-values, and we incorporated these into the popular integrated biclustering algorithm cMonkey. We demonstrate that incorporation of our new algorithm significantly improves bicluster co-regulation scores (p-value = 0.009) and GO annotation scores (p-value = 0.004). Additionally, we used a bicluster based signal to predict whether a given experimental condition will result in yeast peroxisome induction. Using the new algorithm, the classifier accuracy improves from 41.9{\%} to 76.1{\%} correct. Conclusions: We demonstrate that the proposed BSCM helps determine which signals ought to be co-clustered, resulting in more accurately assigned bicluster membership. Furthermore, we show that BSCM can be extended to more accurately detect under which experimental conditions the genes are co-clustered. Features derived from this more accurate analysis of conditional regulation results in a dramatic improvement in the ability to predict a cellular phenotype in yeast. The latest cMonkey is available for download at https://github.com/baliga-lab/cmonkey2. The experimental data and source code featured in this paper is available http://AitchisonLab.com/BSCM. BSCM has been incorporated in the official cMonkey release.",
author = "Danziger, {Samuel A.} and Reiss, {David J.} and Ratushny, {Alexander V.} and Smith, {Jennifer J.} and Christopher Plaisier and Aitchison, {John D.} and Baliga, {Nitin S.}",
year = "2015",
month = "4",
day = "15",
doi = "10.1186/1752-0509-9-S2-S1",
language = "English (US)",
volume = "9",
journal = "BMC Systems Biology",
issn = "1752-0509",
publisher = "BioMed Central",
number = "2",

}

TY - JOUR

T1 - Bicluster Sampled Coherence Metric (BSCM) provides an accurate environmental context for phenotype predictions

AU - Danziger, Samuel A.

AU - Reiss, David J.

AU - Ratushny, Alexander V.

AU - Smith, Jennifer J.

AU - Plaisier, Christopher

AU - Aitchison, John D.

AU - Baliga, Nitin S.

PY - 2015/4/15

Y1 - 2015/4/15

N2 - Background: Biclustering is a popular method for identifying under which experimental conditions biological signatures are co-expressed. However, the general biclustering problem is NP-hard, offering room to focus algorithms on specific biological tasks. We hypothesize that conditional co-regulation of genes is a key factor in determining cell phenotype and that accurately segregating conditions in biclusters will improve such predictions. Thus, we developed a bicluster sampled coherence metric (BSCM) for determining which conditions and signals should be included in a bicluster. Results: Our BSCM calculates condition and cluster size specific p-values, and we incorporated these into the popular integrated biclustering algorithm cMonkey. We demonstrate that incorporation of our new algorithm significantly improves bicluster co-regulation scores (p-value = 0.009) and GO annotation scores (p-value = 0.004). Additionally, we used a bicluster based signal to predict whether a given experimental condition will result in yeast peroxisome induction. Using the new algorithm, the classifier accuracy improves from 41.9% to 76.1% correct. Conclusions: We demonstrate that the proposed BSCM helps determine which signals ought to be co-clustered, resulting in more accurately assigned bicluster membership. Furthermore, we show that BSCM can be extended to more accurately detect under which experimental conditions the genes are co-clustered. Features derived from this more accurate analysis of conditional regulation results in a dramatic improvement in the ability to predict a cellular phenotype in yeast. The latest cMonkey is available for download at https://github.com/baliga-lab/cmonkey2. The experimental data and source code featured in this paper is available http://AitchisonLab.com/BSCM. BSCM has been incorporated in the official cMonkey release.

AB - Background: Biclustering is a popular method for identifying under which experimental conditions biological signatures are co-expressed. However, the general biclustering problem is NP-hard, offering room to focus algorithms on specific biological tasks. We hypothesize that conditional co-regulation of genes is a key factor in determining cell phenotype and that accurately segregating conditions in biclusters will improve such predictions. Thus, we developed a bicluster sampled coherence metric (BSCM) for determining which conditions and signals should be included in a bicluster. Results: Our BSCM calculates condition and cluster size specific p-values, and we incorporated these into the popular integrated biclustering algorithm cMonkey. We demonstrate that incorporation of our new algorithm significantly improves bicluster co-regulation scores (p-value = 0.009) and GO annotation scores (p-value = 0.004). Additionally, we used a bicluster based signal to predict whether a given experimental condition will result in yeast peroxisome induction. Using the new algorithm, the classifier accuracy improves from 41.9% to 76.1% correct. Conclusions: We demonstrate that the proposed BSCM helps determine which signals ought to be co-clustered, resulting in more accurately assigned bicluster membership. Furthermore, we show that BSCM can be extended to more accurately detect under which experimental conditions the genes are co-clustered. Features derived from this more accurate analysis of conditional regulation results in a dramatic improvement in the ability to predict a cellular phenotype in yeast. The latest cMonkey is available for download at https://github.com/baliga-lab/cmonkey2. The experimental data and source code featured in this paper is available http://AitchisonLab.com/BSCM. BSCM has been incorporated in the official cMonkey release.

UR - http://www.scopus.com/inward/record.url?scp=84961575939&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84961575939&partnerID=8YFLogxK

U2 - 10.1186/1752-0509-9-S2-S1

DO - 10.1186/1752-0509-9-S2-S1

M3 - Article

VL - 9

JO - BMC Systems Biology

JF - BMC Systems Biology

SN - 1752-0509

IS - 2

M1 - S1

ER -