MRTuner: A toolkit to enable holistic optimization for MapReduce jobs

Juwei Shi, Jia Zou, Jiaheng Lu, Zhao Cao, Shiqiang Li, Chen Wang

Research output: Contribution to journalConference articlepeer-review

68 Scopus citations

Abstract

MapReduce based data-intensive computing solutions are increasingly deployed as production systems. Unlike Internet companies who invent and adopt the technology from the very beginning, traditional enterprises demand easy-to-use software due to the limited capabilities of administrators. Automatic job optimization software for MapReduce is a promising technique to satisfy such requirements. In this paper, we introduce a toolkit from IBM, called MRTuner, to enable holistic optimization for MapReduce jobs. In particular, we propose a novel Producer-Transporter-Consumer (PTC) model, which characterizes the tradeoffs in the parallel execution among tasks. We also carefully investigate the complicated relations among about twenty parameters, which have significant impact on the job performance. We design an efficient search algorithm to find the optimal execution plan. Finally, we conduct a thorough experimental evaluation on two different types of clusters using the HiBench suite which covers various Hadoop workloads from GB to TB size levels. The results show that the search latency of MRTuner is a few orders of magnitude faster than that of the state-of-the-art cost-based optimizer, and the effectiveness of the optimized execution plan is also significantly improved.

Original languageEnglish (US)
Pages (from-to)1319-1330
Number of pages12
JournalProceedings of the VLDB Endowment
Volume7
Issue number13
DOIs
StatePublished - 2014
Externally publishedYes
EventProceedings of the 40th International Conference on Very Large Data Bases, VLDB 2014 - Hangzhou, China
Duration: Sep 1 2014Sep 5 2014

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • General Computer Science

Fingerprint

Dive into the research topics of 'MRTuner: A toolkit to enable holistic optimization for MapReduce jobs'. Together they form a unique fingerprint.

Cite this