TY - GEN
T1 - VPFS+
T2 - 35th Symposium on Mass Storage Systems and Technologies, MSST 2019
AU - Zhao, Ming
AU - Xu, Yiqi
N1 - Funding Information:
The authors thank the anonymous reviewers for their helpful comments. This research is sponsored by National Science Foundation CAREER award CNS-1619653 and grants CNS-1562837, CNS-1629888, CMMI-1610282, and IIS-1633381.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/5
Y1 - 2019/5
N2 - High-performance computing (HPC) systems are increasingly shared by a variety of data-and metadata-intensive parallel applications. However, existing parallel file systems employed for HPC storage management are unable to differentiate the I/O requests from concurrent applications and meet their different performance requirements. Previous work, vPFS, provided a solution to this problem by virtualizing a parallel file system and enabling proportional-share bandwidth allocation to the applications; but it cannot handle the increasingly diverse applications in today's HPC environments, including those that have different sizes of I/Os and those that are metadata-intensive. This paper presents vPFS+ which builds upon the virtualization framework provided by vPFS but addresses its limitations in supporting diverse HPC applications. First, a new proportional-share I/O scheduler, SFQ(D)+, is created to allow applications with various I/O sizes and issue rates to share the storage with good application-level fairness and system-level utilization. Second, vPFS+ extends the scheduling to also include metadata I/Os and provides performance isolation to metadata-intensive applications. vPFS+ is prototyped on PVFS2, a widely used open-source parallel file system, and evaluated using a comprehensive set of representative HPC benchmarks and applications (IOR, NPB BTIO, WRF, and multi-md-test). The results confirm that the new SFQ(D)+ scheduler can provide significantly better performance isolation to applications with small, bursty I/Os than the traditional SFQ(D) scheduler (3.35 times better) and the native PVFS2 (8.25 times better) while still making efficient use of the storage. The results also show that vPFS+ can deliver near-perfect proportional sharing (>95% of the target sharing ratio) to metadata-intensive applications.
AB - High-performance computing (HPC) systems are increasingly shared by a variety of data-and metadata-intensive parallel applications. However, existing parallel file systems employed for HPC storage management are unable to differentiate the I/O requests from concurrent applications and meet their different performance requirements. Previous work, vPFS, provided a solution to this problem by virtualizing a parallel file system and enabling proportional-share bandwidth allocation to the applications; but it cannot handle the increasingly diverse applications in today's HPC environments, including those that have different sizes of I/Os and those that are metadata-intensive. This paper presents vPFS+ which builds upon the virtualization framework provided by vPFS but addresses its limitations in supporting diverse HPC applications. First, a new proportional-share I/O scheduler, SFQ(D)+, is created to allow applications with various I/O sizes and issue rates to share the storage with good application-level fairness and system-level utilization. Second, vPFS+ extends the scheduling to also include metadata I/Os and provides performance isolation to metadata-intensive applications. vPFS+ is prototyped on PVFS2, a widely used open-source parallel file system, and evaluated using a comprehensive set of representative HPC benchmarks and applications (IOR, NPB BTIO, WRF, and multi-md-test). The results confirm that the new SFQ(D)+ scheduler can provide significantly better performance isolation to applications with small, bursty I/Os than the traditional SFQ(D) scheduler (3.35 times better) and the native PVFS2 (8.25 times better) while still making efficient use of the storage. The results also show that vPFS+ can deliver near-perfect proportional sharing (>95% of the target sharing ratio) to metadata-intensive applications.
KW - I/O scheduling
KW - parallel storage
KW - performance management
UR - http://www.scopus.com/inward/record.url?scp=85075026837&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85075026837&partnerID=8YFLogxK
U2 - 10.1109/MSST.2019.00-16
DO - 10.1109/MSST.2019.00-16
M3 - Conference contribution
AN - SCOPUS:85075026837
T3 - IEEE Symposium on Mass Storage Systems and Technologies
SP - 51
EP - 64
BT - Proceedings - 2019 35th Symposium on Mass Storage Systems and Technologies, MSST 2019
PB - IEEE Computer Society
Y2 - 20 May 2019 through 24 May 2019
ER -