Map task scheduling in MapReduce with data locality

Throughput and heavy-traffic optimality

Weina Wang, Kai Zhu, Lei Ying, Jian Tan, Li Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

55 Citations (Scopus)

Abstract

Scheduling map tasks to improve data locality is crucial to the performance of MapReduce. Many works have been devoted to increasing data locality for better efficiency. However, to the best of our knowledge, fundamental limits of MapReduce computing clusters with data locality, including the capacity region and theoretical bounds on the delay performance, have not been studied. In this paper, we address these problems from a stochastic network perspective. Our focus is to strike the right balance between data-locality and load-balancing to simultaneously maximize throughput and minimize delay. We present a new queueing architecture and propose a map task scheduling algorithm constituted by the Join the Shortest Queue policy together with the MaxWeight policy. We identify an outer bound on the capacity region, and then prove that the proposed algorithm stabilizes any arrival rate vector strictly within this outer bound. It shows that the algorithm is throughput optimal and the outer bound coincides with the actual capacity region. Further, we study the number of backlogged tasks under the proposed algorithm, which is directly related to the delay performance based on Little's law. We prove that the proposed algorithm is heavy-traffic optimal, i.e., it asymptotically minimizes the number of backlogged tasks as the arrival rate vector approaches the boundary of the capacity region. Therefore, the proposed algorithm is also delay optimal in the heavy-traffic regime.

Original languageEnglish (US)
Title of host publicationProceedings - IEEE INFOCOM
Pages1609-1617
Number of pages9
DOIs
StatePublished - 2013
Event32nd IEEE Conference on Computer Communications, IEEE INFOCOM 2013 - Turin, Italy
Duration: Apr 14 2013Apr 19 2013

Other

Other32nd IEEE Conference on Computer Communications, IEEE INFOCOM 2013
CountryItaly
CityTurin
Period4/14/134/19/13

Fingerprint

Scheduling
Throughput
Cluster computing
Scheduling algorithms
Resource allocation

ASJC Scopus subject areas

  • Computer Science(all)
  • Electrical and Electronic Engineering

Cite this

Wang, W., Zhu, K., Ying, L., Tan, J., & Zhang, L. (2013). Map task scheduling in MapReduce with data locality: Throughput and heavy-traffic optimality. In Proceedings - IEEE INFOCOM (pp. 1609-1617). [6566957] https://doi.org/10.1109/INFCOM.2013.6566957

Map task scheduling in MapReduce with data locality : Throughput and heavy-traffic optimality. / Wang, Weina; Zhu, Kai; Ying, Lei; Tan, Jian; Zhang, Li.

Proceedings - IEEE INFOCOM. 2013. p. 1609-1617 6566957.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, W, Zhu, K, Ying, L, Tan, J & Zhang, L 2013, Map task scheduling in MapReduce with data locality: Throughput and heavy-traffic optimality. in Proceedings - IEEE INFOCOM., 6566957, pp. 1609-1617, 32nd IEEE Conference on Computer Communications, IEEE INFOCOM 2013, Turin, Italy, 4/14/13. https://doi.org/10.1109/INFCOM.2013.6566957
Wang, Weina ; Zhu, Kai ; Ying, Lei ; Tan, Jian ; Zhang, Li. / Map task scheduling in MapReduce with data locality : Throughput and heavy-traffic optimality. Proceedings - IEEE INFOCOM. 2013. pp. 1609-1617
@inproceedings{c14f2592d4e842cd9254a49d5619884f,
title = "Map task scheduling in MapReduce with data locality: Throughput and heavy-traffic optimality",
abstract = "Scheduling map tasks to improve data locality is crucial to the performance of MapReduce. Many works have been devoted to increasing data locality for better efficiency. However, to the best of our knowledge, fundamental limits of MapReduce computing clusters with data locality, including the capacity region and theoretical bounds on the delay performance, have not been studied. In this paper, we address these problems from a stochastic network perspective. Our focus is to strike the right balance between data-locality and load-balancing to simultaneously maximize throughput and minimize delay. We present a new queueing architecture and propose a map task scheduling algorithm constituted by the Join the Shortest Queue policy together with the MaxWeight policy. We identify an outer bound on the capacity region, and then prove that the proposed algorithm stabilizes any arrival rate vector strictly within this outer bound. It shows that the algorithm is throughput optimal and the outer bound coincides with the actual capacity region. Further, we study the number of backlogged tasks under the proposed algorithm, which is directly related to the delay performance based on Little's law. We prove that the proposed algorithm is heavy-traffic optimal, i.e., it asymptotically minimizes the number of backlogged tasks as the arrival rate vector approaches the boundary of the capacity region. Therefore, the proposed algorithm is also delay optimal in the heavy-traffic regime.",
author = "Weina Wang and Kai Zhu and Lei Ying and Jian Tan and Li Zhang",
year = "2013",
doi = "10.1109/INFCOM.2013.6566957",
language = "English (US)",
isbn = "9781467359467",
pages = "1609--1617",
booktitle = "Proceedings - IEEE INFOCOM",

}

TY - GEN

T1 - Map task scheduling in MapReduce with data locality

T2 - Throughput and heavy-traffic optimality

AU - Wang, Weina

AU - Zhu, Kai

AU - Ying, Lei

AU - Tan, Jian

AU - Zhang, Li

PY - 2013

Y1 - 2013

N2 - Scheduling map tasks to improve data locality is crucial to the performance of MapReduce. Many works have been devoted to increasing data locality for better efficiency. However, to the best of our knowledge, fundamental limits of MapReduce computing clusters with data locality, including the capacity region and theoretical bounds on the delay performance, have not been studied. In this paper, we address these problems from a stochastic network perspective. Our focus is to strike the right balance between data-locality and load-balancing to simultaneously maximize throughput and minimize delay. We present a new queueing architecture and propose a map task scheduling algorithm constituted by the Join the Shortest Queue policy together with the MaxWeight policy. We identify an outer bound on the capacity region, and then prove that the proposed algorithm stabilizes any arrival rate vector strictly within this outer bound. It shows that the algorithm is throughput optimal and the outer bound coincides with the actual capacity region. Further, we study the number of backlogged tasks under the proposed algorithm, which is directly related to the delay performance based on Little's law. We prove that the proposed algorithm is heavy-traffic optimal, i.e., it asymptotically minimizes the number of backlogged tasks as the arrival rate vector approaches the boundary of the capacity region. Therefore, the proposed algorithm is also delay optimal in the heavy-traffic regime.

AB - Scheduling map tasks to improve data locality is crucial to the performance of MapReduce. Many works have been devoted to increasing data locality for better efficiency. However, to the best of our knowledge, fundamental limits of MapReduce computing clusters with data locality, including the capacity region and theoretical bounds on the delay performance, have not been studied. In this paper, we address these problems from a stochastic network perspective. Our focus is to strike the right balance between data-locality and load-balancing to simultaneously maximize throughput and minimize delay. We present a new queueing architecture and propose a map task scheduling algorithm constituted by the Join the Shortest Queue policy together with the MaxWeight policy. We identify an outer bound on the capacity region, and then prove that the proposed algorithm stabilizes any arrival rate vector strictly within this outer bound. It shows that the algorithm is throughput optimal and the outer bound coincides with the actual capacity region. Further, we study the number of backlogged tasks under the proposed algorithm, which is directly related to the delay performance based on Little's law. We prove that the proposed algorithm is heavy-traffic optimal, i.e., it asymptotically minimizes the number of backlogged tasks as the arrival rate vector approaches the boundary of the capacity region. Therefore, the proposed algorithm is also delay optimal in the heavy-traffic regime.

UR - http://www.scopus.com/inward/record.url?scp=84883072667&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84883072667&partnerID=8YFLogxK

U2 - 10.1109/INFCOM.2013.6566957

DO - 10.1109/INFCOM.2013.6566957

M3 - Conference contribution

SN - 9781467359467

SP - 1609

EP - 1617

BT - Proceedings - IEEE INFOCOM

ER -