Parallel performance of molecular dynamics trajectory analysis

Mahzad Khoshlessan, Ioannis Paraskevakos, Geoffrey C. Fox, Shantenu Jha, Oliver Beckstein

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

The performance of biomolecular molecular dynamics simulations has steadily increased on modern high-performance computing resources but acceleration of the analysis of the output trajectories has lagged behind so that analyzing simulations is becoming a bottleneck. To close this gap, we studied the performance of trajectory analysis with message passing interface (MPI) parallelization and the Python MDAnalysis library on three different Extreme Science and Engineering Discovery Environment (XSEDE) supercomputers where trajectories were read from a Lustre parallel file system. Strong scaling performance was impeded by stragglers, MPI processes that were slower than the typical process. Stragglers were less prevalent for compute-bound workloads, thus pointing to file reading as a bottleneck for scaling. However, a more complicated picture emerged in which both the computation and the data ingestion exhibited close to ideal strong scaling behavior whereas stragglers were primarily caused by either large MPI communication costs or long times to open the single shared trajectory file. We improved overall strong scaling performance by either subfiling (splitting the trajectory into separate files) or MPI-IO with parallel HDF5 trajectory files. The parallel HDF5 approach resulted in near ideal strong scaling on up to 384 cores (16 nodes), thus reducing trajectory analysis times by two orders of magnitude compared with the serial approach.

Original languageEnglish (US)
Article numbere5789
JournalConcurrency Computation Practice and Experience
Volume32
Issue number19
DOIs
StatePublished - Oct 10 2020

Keywords

  • HDF5
  • HPC
  • MDAnalysis
  • MPI
  • MPI I/O
  • Python
  • big data
  • molecular dynamics
  • straggler
  • trajectory analysis

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Software
  • Computer Science Applications
  • Computer Networks and Communications
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Parallel performance of molecular dynamics trajectory analysis'. Together they form a unique fingerprint.

Cite this