MRTuner: A toolkit to enable holistic optimization for MapReduce jobs

Juwei Shi; Jia Zou; Jiaheng Lu; Zhao Cao; Shiqiang Li; Chen Wang

doi:10.14778/2733004.2733005

MRTuner: A toolkit to enable holistic optimization for MapReduce jobs

Juwei Shi, Jia Zou, Jiaheng Lu, Zhao Cao, Shiqiang Li, Chen Wang

Research output: Contribution to journal › Conference article › peer-review

68 Scopus citations

Abstract

MapReduce based data-intensive computing solutions are increasingly deployed as production systems. Unlike Internet companies who invent and adopt the technology from the very beginning, traditional enterprises demand easy-to-use software due to the limited capabilities of administrators. Automatic job optimization software for MapReduce is a promising technique to satisfy such requirements. In this paper, we introduce a toolkit from IBM, called MRTuner, to enable holistic optimization for MapReduce jobs. In particular, we propose a novel Producer-Transporter-Consumer (PTC) model, which characterizes the tradeoffs in the parallel execution among tasks. We also carefully investigate the complicated relations among about twenty parameters, which have significant impact on the job performance. We design an efficient search algorithm to find the optimal execution plan. Finally, we conduct a thorough experimental evaluation on two different types of clusters using the HiBench suite which covers various Hadoop workloads from GB to TB size levels. The results show that the search latency of MRTuner is a few orders of magnitude faster than that of the state-of-the-art cost-based optimizer, and the effectiveness of the optimized execution plan is also significantly improved.

Original language	English (US)
Pages (from-to)	1319-1330
Number of pages	12
Journal	Proceedings of the VLDB Endowment
Volume	7
Issue number	13
DOIs	https://doi.org/10.14778/2733004.2733005
State	Published - 2014
Externally published	Yes
Event	Proceedings of the 40th International Conference on Very Large Data Bases, VLDB 2014 - Hangzhou, China Duration: Sep 1 2014 → Sep 5 2014

ASJC Scopus subject areas

Computer Science (miscellaneous)
General Computer Science

Access to Document

10.14778/2733004.2733005

Cite this

@article{948d9cfc6ca942dcba553c0b0c4bb1a7,

title = "MRTuner: A toolkit to enable holistic optimization for MapReduce jobs",

abstract = "MapReduce based data-intensive computing solutions are increasingly deployed as production systems. Unlike Internet companies who invent and adopt the technology from the very beginning, traditional enterprises demand easy-to-use software due to the limited capabilities of administrators. Automatic job optimization software for MapReduce is a promising technique to satisfy such requirements. In this paper, we introduce a toolkit from IBM, called MRTuner, to enable holistic optimization for MapReduce jobs. In particular, we propose a novel Producer-Transporter-Consumer (PTC) model, which characterizes the tradeoffs in the parallel execution among tasks. We also carefully investigate the complicated relations among about twenty parameters, which have significant impact on the job performance. We design an efficient search algorithm to find the optimal execution plan. Finally, we conduct a thorough experimental evaluation on two different types of clusters using the HiBench suite which covers various Hadoop workloads from GB to TB size levels. The results show that the search latency of MRTuner is a few orders of magnitude faster than that of the state-of-the-art cost-based optimizer, and the effectiveness of the optimized execution plan is also significantly improved.",

author = "Juwei Shi and Jia Zou and Jiaheng Lu and Zhao Cao and Shiqiang Li and Chen Wang",

year = "2014",

doi = "10.14778/2733004.2733005",

language = "English (US)",

volume = "7",

pages = "1319--1330",

journal = "Proceedings of the VLDB Endowment",

issn = "2150-8097",

publisher = "Very Large Data Base Endowment Inc.",

number = "13",

note = "Proceedings of the 40th International Conference on Very Large Data Bases, VLDB 2014 ; Conference date: 01-09-2014 Through 05-09-2014",

}

TY - JOUR

T1 - MRTuner

T2 - Proceedings of the 40th International Conference on Very Large Data Bases, VLDB 2014

AU - Shi, Juwei

AU - Zou, Jia

AU - Lu, Jiaheng

AU - Cao, Zhao

AU - Li, Shiqiang

AU - Wang, Chen

PY - 2014

Y1 - 2014

N2 - MapReduce based data-intensive computing solutions are increasingly deployed as production systems. Unlike Internet companies who invent and adopt the technology from the very beginning, traditional enterprises demand easy-to-use software due to the limited capabilities of administrators. Automatic job optimization software for MapReduce is a promising technique to satisfy such requirements. In this paper, we introduce a toolkit from IBM, called MRTuner, to enable holistic optimization for MapReduce jobs. In particular, we propose a novel Producer-Transporter-Consumer (PTC) model, which characterizes the tradeoffs in the parallel execution among tasks. We also carefully investigate the complicated relations among about twenty parameters, which have significant impact on the job performance. We design an efficient search algorithm to find the optimal execution plan. Finally, we conduct a thorough experimental evaluation on two different types of clusters using the HiBench suite which covers various Hadoop workloads from GB to TB size levels. The results show that the search latency of MRTuner is a few orders of magnitude faster than that of the state-of-the-art cost-based optimizer, and the effectiveness of the optimized execution plan is also significantly improved.

AB - MapReduce based data-intensive computing solutions are increasingly deployed as production systems. Unlike Internet companies who invent and adopt the technology from the very beginning, traditional enterprises demand easy-to-use software due to the limited capabilities of administrators. Automatic job optimization software for MapReduce is a promising technique to satisfy such requirements. In this paper, we introduce a toolkit from IBM, called MRTuner, to enable holistic optimization for MapReduce jobs. In particular, we propose a novel Producer-Transporter-Consumer (PTC) model, which characterizes the tradeoffs in the parallel execution among tasks. We also carefully investigate the complicated relations among about twenty parameters, which have significant impact on the job performance. We design an efficient search algorithm to find the optimal execution plan. Finally, we conduct a thorough experimental evaluation on two different types of clusters using the HiBench suite which covers various Hadoop workloads from GB to TB size levels. The results show that the search latency of MRTuner is a few orders of magnitude faster than that of the state-of-the-art cost-based optimizer, and the effectiveness of the optimized execution plan is also significantly improved.

UR - http://www.scopus.com/inward/record.url?scp=84905856702&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84905856702&partnerID=8YFLogxK

U2 - 10.14778/2733004.2733005

DO - 10.14778/2733004.2733005

M3 - Conference article

AN - SCOPUS:84905856702

SN - 2150-8097

VL - 7

SP - 1319

EP - 1330

JO - Proceedings of the VLDB Endowment

JF - Proceedings of the VLDB Endowment

IS - 13

Y2 - 1 September 2014 through 5 September 2014

ER -

MRTuner: A toolkit to enable holistic optimization for MapReduce jobs

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this