A comparative analysis of clustering algorithms: O2 migration in truncated hemoglobin i from transition networks

Pierre André Cazade; Wenwei Zheng; Diego Prada-Gracia; Ganna Berezovska; Francesco Rao; Cecilia Clementi; Markus Meuwly

doi:10.1063/1.4904431

A comparative analysis of clustering algorithms: O₂ migration in truncated hemoglobin i from transition networks

Pierre André Cazade, Wenwei Zheng, Diego Prada-Gracia, Ganna Berezovska, Francesco Rao, Cecilia Clementi, Markus Meuwly

Research output: Contribution to journal › Article › peer-review

10 Scopus citations

Abstract

The ligand migration network for O₂-diffusion in truncated Hemoglobin N is analyzed based on three different clustering schemes. For coordinate-based clustering, the conventional k-means and the kinetics-based Markov Clustering (MCL) methods are employed, whereas the locally scaled diffusion map (LSDMap) method is a collective-variable-based approach. It is found that all three methods agree well in their geometrical definition of the most important docking site, and all experimentally known docking sites are recovered by all three methods. Also, for most of the states, their population coincides quite favourably, whereas the kinetics of and between the states differs. One of the major differences between k-means and MCL clustering on the one hand and LSDMap on the other is that the latter finds one large primary cluster containing the Xe1a, IS1, and ENT states. This is related to the fact that the motion within the state occurs on similar time scales, whereas structurally the state is found to be quite diverse. In agreement with previous explicit atomistic simulations, the Xe3 pocket is found to be a highly dynamical site which points to its potential role as a hub in the network. This is also highlighted in the fact that LSDMap cannot identify this state. First passage time distributions from MCL clusterings using a one- (ligand-position) and two-dimensional (ligand-position and protein-structure) descriptor suggest that ligand- and protein-motions are coupled. The benefits and drawbacks of the three methods are discussed in a comparative fashion and highlight that depending on the questions at hand the best-performing method for a particular data set may differ.

Original language	English (US)
Article number	025103
Journal	Journal of Chemical Physics
Volume	142
Issue number	2
DOIs	https://doi.org/10.1063/1.4904431
State	Published - Jan 14 2015
Externally published	Yes

ASJC Scopus subject areas

General Physics and Astronomy
Physical and Theoretical Chemistry

Access to Document

10.1063/1.4904431

Cite this

@article{e373fa429db6400fb449dfd9e952479c,

title = "A comparative analysis of clustering algorithms: O2 migration in truncated hemoglobin i from transition networks",

abstract = "The ligand migration network for O2-diffusion in truncated Hemoglobin N is analyzed based on three different clustering schemes. For coordinate-based clustering, the conventional k-means and the kinetics-based Markov Clustering (MCL) methods are employed, whereas the locally scaled diffusion map (LSDMap) method is a collective-variable-based approach. It is found that all three methods agree well in their geometrical definition of the most important docking site, and all experimentally known docking sites are recovered by all three methods. Also, for most of the states, their population coincides quite favourably, whereas the kinetics of and between the states differs. One of the major differences between k-means and MCL clustering on the one hand and LSDMap on the other is that the latter finds one large primary cluster containing the Xe1a, IS1, and ENT states. This is related to the fact that the motion within the state occurs on similar time scales, whereas structurally the state is found to be quite diverse. In agreement with previous explicit atomistic simulations, the Xe3 pocket is found to be a highly dynamical site which points to its potential role as a hub in the network. This is also highlighted in the fact that LSDMap cannot identify this state. First passage time distributions from MCL clusterings using a one- (ligand-position) and two-dimensional (ligand-position and protein-structure) descriptor suggest that ligand- and protein-motions are coupled. The benefits and drawbacks of the three methods are discussed in a comparative fashion and highlight that depending on the questions at hand the best-performing method for a particular data set may differ.",

author = "Cazade, {Pierre Andr{\'e}} and Wenwei Zheng and Diego Prada-Gracia and Ganna Berezovska and Francesco Rao and Cecilia Clementi and Markus Meuwly",

note = "Publisher Copyright: {\textcopyright} 2015 AIP Publishing LLC.",

year = "2015",

month = jan,

day = "14",

doi = "10.1063/1.4904431",

language = "English (US)",

volume = "142",

journal = "Journal of Chemical Physics",

issn = "0021-9606",

publisher = "American Institute of Physics Publising LLC",

number = "2",

}

TY - JOUR

T1 - A comparative analysis of clustering algorithms

T2 - O2 migration in truncated hemoglobin i from transition networks

AU - Cazade, Pierre André

AU - Zheng, Wenwei

AU - Prada-Gracia, Diego

AU - Berezovska, Ganna

AU - Rao, Francesco

AU - Clementi, Cecilia

AU - Meuwly, Markus

PY - 2015/1/14

Y1 - 2015/1/14

N2 - The ligand migration network for O2-diffusion in truncated Hemoglobin N is analyzed based on three different clustering schemes. For coordinate-based clustering, the conventional k-means and the kinetics-based Markov Clustering (MCL) methods are employed, whereas the locally scaled diffusion map (LSDMap) method is a collective-variable-based approach. It is found that all three methods agree well in their geometrical definition of the most important docking site, and all experimentally known docking sites are recovered by all three methods. Also, for most of the states, their population coincides quite favourably, whereas the kinetics of and between the states differs. One of the major differences between k-means and MCL clustering on the one hand and LSDMap on the other is that the latter finds one large primary cluster containing the Xe1a, IS1, and ENT states. This is related to the fact that the motion within the state occurs on similar time scales, whereas structurally the state is found to be quite diverse. In agreement with previous explicit atomistic simulations, the Xe3 pocket is found to be a highly dynamical site which points to its potential role as a hub in the network. This is also highlighted in the fact that LSDMap cannot identify this state. First passage time distributions from MCL clusterings using a one- (ligand-position) and two-dimensional (ligand-position and protein-structure) descriptor suggest that ligand- and protein-motions are coupled. The benefits and drawbacks of the three methods are discussed in a comparative fashion and highlight that depending on the questions at hand the best-performing method for a particular data set may differ.

AB - The ligand migration network for O2-diffusion in truncated Hemoglobin N is analyzed based on three different clustering schemes. For coordinate-based clustering, the conventional k-means and the kinetics-based Markov Clustering (MCL) methods are employed, whereas the locally scaled diffusion map (LSDMap) method is a collective-variable-based approach. It is found that all three methods agree well in their geometrical definition of the most important docking site, and all experimentally known docking sites are recovered by all three methods. Also, for most of the states, their population coincides quite favourably, whereas the kinetics of and between the states differs. One of the major differences between k-means and MCL clustering on the one hand and LSDMap on the other is that the latter finds one large primary cluster containing the Xe1a, IS1, and ENT states. This is related to the fact that the motion within the state occurs on similar time scales, whereas structurally the state is found to be quite diverse. In agreement with previous explicit atomistic simulations, the Xe3 pocket is found to be a highly dynamical site which points to its potential role as a hub in the network. This is also highlighted in the fact that LSDMap cannot identify this state. First passage time distributions from MCL clusterings using a one- (ligand-position) and two-dimensional (ligand-position and protein-structure) descriptor suggest that ligand- and protein-motions are coupled. The benefits and drawbacks of the three methods are discussed in a comparative fashion and highlight that depending on the questions at hand the best-performing method for a particular data set may differ.

UR - http://www.scopus.com/inward/record.url?scp=84923884841&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84923884841&partnerID=8YFLogxK

U2 - 10.1063/1.4904431

DO - 10.1063/1.4904431

M3 - Article

C2 - 25591387

AN - SCOPUS:84923884841

SN - 0021-9606

VL - 142

JO - Journal of Chemical Physics

JF - Journal of Chemical Physics

IS - 2

M1 - 025103

ER -

A comparative analysis of clustering algorithms: O₂ migration in truncated hemoglobin i from transition networks

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this