IBIS: Interposed big-data I/O scheduler

Yiqi Xu; Ming Zhao

doi:10.1145/2907294.2907319

IBIS: Interposed big-data I/O scheduler

Yiqi Xu, Ming Zhao

Engineering, Ira A. Fulton Schools of (IAFSE)

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

8 Scopus citations

Abstract

Big-data systems are increasingly shared by diverse, data-intensive applications from different domains. However, existing systems lack the support for I/O management, and the performance of bigdata applications degrades in unpredictable ways when they contend for I/Os. To address this challenge, this paper proposes IBIS, an Interposed Big-data I/O Scheduler, to provide I/O performance differentiation for competing applications in a shared big-data system. IBIS transparently intercepts, isolates, and schedules an application's different phases of I/Os via an I/O interposition layer on every datanode of the big-data system. It provides a new proportionalshare I/O scheduler, SFQ(D2), to allow applications to share the I/O service of each datanode with good fairness and resource utilization. It enables the distributed I/O schedulers to coordinate with one another and to achieve proportional sharing of the big-data system's total I/O service in a scalable manner. Finally, it supports the shared use of big-data resources by diverse frameworks and manages the I/Os from different types of big-data workloads (e.g., batch jobs vs. queries) across these frameworks. The prototype of IBIS is implemented in Hadoop/YARN, a widely used big-data system. Experiments based on a variety of representative applications (WordCount, TeraSort, Facebook, TPC-H) show that IBIS achieves good total-service proportional sharing with low overhead in both application performance and resource usages. IBIS is also shown to support various performance policies: it can deliver stronger performance isolation than native Hadoop/YARN (99% better for WordCount and 15% better for TPC-H queries) with good resource utilization; and it can also achieve perfect proportional slowdown with better application performance (30% better than native Hadoop).

Original language	English (US)
Title of host publication	HPDC 2016 - Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing
Publisher	Association for Computing Machinery, Inc
Pages	111-122
Number of pages	12
ISBN (Electronic)	9781450343145
DOIs	https://doi.org/10.1145/2907294.2907319
State	Published - May 31 2016
Event	25th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2016 - Kyoto, Japan Duration: May 31 2016 → Jun 4 2016

Publication series

Name	HPDC 2016 - Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing

Other

Other	25th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2016
Country/Territory	Japan
City	Kyoto
Period	5/31/16 → 6/4/16

ASJC Scopus subject areas

Computational Theory and Mathematics
Computer Science Applications
Software

Access to Document

10.1145/2907294.2907319

Cite this

Xu, Y., & Zhao, M. (2016). IBIS: Interposed big-data I/O scheduler. In HPDC 2016 - Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing (pp. 111-122). (HPDC 2016 - Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing). Association for Computing Machinery, Inc. https://doi.org/10.1145/2907294.2907319

IBIS: Interposed big-data I/O scheduler. / Xu, Yiqi; Zhao, Ming.
HPDC 2016 - Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing. Association for Computing Machinery, Inc, 2016. p. 111-122 (HPDC 2016 - Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Xu, Y & Zhao, M 2016, IBIS: Interposed big-data I/O scheduler. in HPDC 2016 - Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing. HPDC 2016 - Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, Association for Computing Machinery, Inc, pp. 111-122, 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2016, Kyoto, Japan, 5/31/16. https://doi.org/10.1145/2907294.2907319

Xu Y, Zhao M. IBIS: Interposed big-data I/O scheduler. In HPDC 2016 - Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing. Association for Computing Machinery, Inc. 2016. p. 111-122. (HPDC 2016 - Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing). doi: 10.1145/2907294.2907319

@inproceedings{ff430842f5b9474f938a4d09f7171412,

title = "IBIS: Interposed big-data I/O scheduler",

abstract = "Big-data systems are increasingly shared by diverse, data-intensive applications from different domains. However, existing systems lack the support for I/O management, and the performance of bigdata applications degrades in unpredictable ways when they contend for I/Os. To address this challenge, this paper proposes IBIS, an Interposed Big-data I/O Scheduler, to provide I/O performance differentiation for competing applications in a shared big-data system. IBIS transparently intercepts, isolates, and schedules an application's different phases of I/Os via an I/O interposition layer on every datanode of the big-data system. It provides a new proportionalshare I/O scheduler, SFQ(D2), to allow applications to share the I/O service of each datanode with good fairness and resource utilization. It enables the distributed I/O schedulers to coordinate with one another and to achieve proportional sharing of the big-data system's total I/O service in a scalable manner. Finally, it supports the shared use of big-data resources by diverse frameworks and manages the I/Os from different types of big-data workloads (e.g., batch jobs vs. queries) across these frameworks. The prototype of IBIS is implemented in Hadoop/YARN, a widely used big-data system. Experiments based on a variety of representative applications (WordCount, TeraSort, Facebook, TPC-H) show that IBIS achieves good total-service proportional sharing with low overhead in both application performance and resource usages. IBIS is also shown to support various performance policies: it can deliver stronger performance isolation than native Hadoop/YARN (99% better for WordCount and 15% better for TPC-H queries) with good resource utilization; and it can also achieve perfect proportional slowdown with better application performance (30% better than native Hadoop).",

author = "Yiqi Xu and Ming Zhao",

year = "2016",

month = may,

day = "31",

doi = "10.1145/2907294.2907319",

language = "English (US)",

series = "HPDC 2016 - Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing",

publisher = "Association for Computing Machinery, Inc",

pages = "111--122",

booktitle = "HPDC 2016 - Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing",

note = "25th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2016 ; Conference date: 31-05-2016 Through 04-06-2016",

}

TY - GEN

T1 - IBIS

T2 - 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2016

AU - Xu, Yiqi

AU - Zhao, Ming

PY - 2016/5/31

Y1 - 2016/5/31

N2 - Big-data systems are increasingly shared by diverse, data-intensive applications from different domains. However, existing systems lack the support for I/O management, and the performance of bigdata applications degrades in unpredictable ways when they contend for I/Os. To address this challenge, this paper proposes IBIS, an Interposed Big-data I/O Scheduler, to provide I/O performance differentiation for competing applications in a shared big-data system. IBIS transparently intercepts, isolates, and schedules an application's different phases of I/Os via an I/O interposition layer on every datanode of the big-data system. It provides a new proportionalshare I/O scheduler, SFQ(D2), to allow applications to share the I/O service of each datanode with good fairness and resource utilization. It enables the distributed I/O schedulers to coordinate with one another and to achieve proportional sharing of the big-data system's total I/O service in a scalable manner. Finally, it supports the shared use of big-data resources by diverse frameworks and manages the I/Os from different types of big-data workloads (e.g., batch jobs vs. queries) across these frameworks. The prototype of IBIS is implemented in Hadoop/YARN, a widely used big-data system. Experiments based on a variety of representative applications (WordCount, TeraSort, Facebook, TPC-H) show that IBIS achieves good total-service proportional sharing with low overhead in both application performance and resource usages. IBIS is also shown to support various performance policies: it can deliver stronger performance isolation than native Hadoop/YARN (99% better for WordCount and 15% better for TPC-H queries) with good resource utilization; and it can also achieve perfect proportional slowdown with better application performance (30% better than native Hadoop).

AB - Big-data systems are increasingly shared by diverse, data-intensive applications from different domains. However, existing systems lack the support for I/O management, and the performance of bigdata applications degrades in unpredictable ways when they contend for I/Os. To address this challenge, this paper proposes IBIS, an Interposed Big-data I/O Scheduler, to provide I/O performance differentiation for competing applications in a shared big-data system. IBIS transparently intercepts, isolates, and schedules an application's different phases of I/Os via an I/O interposition layer on every datanode of the big-data system. It provides a new proportionalshare I/O scheduler, SFQ(D2), to allow applications to share the I/O service of each datanode with good fairness and resource utilization. It enables the distributed I/O schedulers to coordinate with one another and to achieve proportional sharing of the big-data system's total I/O service in a scalable manner. Finally, it supports the shared use of big-data resources by diverse frameworks and manages the I/Os from different types of big-data workloads (e.g., batch jobs vs. queries) across these frameworks. The prototype of IBIS is implemented in Hadoop/YARN, a widely used big-data system. Experiments based on a variety of representative applications (WordCount, TeraSort, Facebook, TPC-H) show that IBIS achieves good total-service proportional sharing with low overhead in both application performance and resource usages. IBIS is also shown to support various performance policies: it can deliver stronger performance isolation than native Hadoop/YARN (99% better for WordCount and 15% better for TPC-H queries) with good resource utilization; and it can also achieve perfect proportional slowdown with better application performance (30% better than native Hadoop).

UR - http://www.scopus.com/inward/record.url?scp=84978520087&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84978520087&partnerID=8YFLogxK

U2 - 10.1145/2907294.2907319

DO - 10.1145/2907294.2907319

M3 - Conference contribution

AN - SCOPUS:84978520087

T3 - HPDC 2016 - Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing

SP - 111

EP - 122

BT - HPDC 2016 - Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing

PB - Association for Computing Machinery, Inc

Y2 - 31 May 2016 through 4 June 2016

ER -

IBIS: Interposed big-data I/O scheduler

Abstract

Publication series

Other

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this