VPFS+: Managing I/O Performance for Diverse HPC Applications

Ming Zhao, Yiqi Xu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

High-performance computing (HPC) systems are increasingly shared by a variety of data-and metadata-intensive parallel applications. However, existing parallel file systems employed for HPC storage management are unable to differentiate the I/O requests from concurrent applications and meet their different performance requirements. Previous work, vPFS, provided a solution to this problem by virtualizing a parallel file system and enabling proportional-share bandwidth allocation to the applications; but it cannot handle the increasingly diverse applications in today's HPC environments, including those that have different sizes of I/Os and those that are metadata-intensive. This paper presents vPFS+ which builds upon the virtualization framework provided by vPFS but addresses its limitations in supporting diverse HPC applications. First, a new proportional-share I/O scheduler, SFQ(D)+, is created to allow applications with various I/O sizes and issue rates to share the storage with good application-level fairness and system-level utilization. Second, vPFS+ extends the scheduling to also include metadata I/Os and provides performance isolation to metadata-intensive applications. vPFS+ is prototyped on PVFS2, a widely used open-source parallel file system, and evaluated using a comprehensive set of representative HPC benchmarks and applications (IOR, NPB BTIO, WRF, and multi-md-test). The results confirm that the new SFQ(D)+ scheduler can provide significantly better performance isolation to applications with small, bursty I/Os than the traditional SFQ(D) scheduler (3.35 times better) and the native PVFS2 (8.25 times better) while still making efficient use of the storage. The results also show that vPFS+ can deliver near-perfect proportional sharing (>95% of the target sharing ratio) to metadata-intensive applications.

Original languageEnglish (US)
Title of host publicationProceedings - 2019 35th Symposium on Mass Storage Systems and Technologies, MSST 2019
PublisherIEEE Computer Society
Pages51-64
Number of pages14
ISBN (Electronic)9781728139203
DOIs
StatePublished - May 2019
Event35th Symposium on Mass Storage Systems and Technologies, MSST 2019 - Santa Clara, United States
Duration: May 20 2019May 24 2019

Publication series

NameIEEE Symposium on Mass Storage Systems and Technologies
Volume2019-May
ISSN (Print)2160-1968

Conference

Conference35th Symposium on Mass Storage Systems and Technologies, MSST 2019
Country/TerritoryUnited States
CitySanta Clara
Period5/20/195/24/19

Keywords

  • I/O scheduling
  • parallel storage
  • performance management

ASJC Scopus subject areas

  • Hardware and Architecture
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'VPFS+: Managing I/O Performance for Diverse HPC Applications'. Together they form a unique fingerprint.

Cite this