Energy-efficient acceleration of MapReduce applications using FPGAs

Katayoun Neshatpour; Maria Malik; Avesta Sasan; Setareh Rafatirad; Tinoush Mohsenin; Hassan Ghasemzadeh; Houman Homayoun

doi:10.1016/j.jpdc.2018.02.004

Energy-efficient acceleration of MapReduce applications using FPGAs

Katayoun Neshatpour, Maria Malik, Avesta Sasan, Setareh Rafatirad, Tinoush Mohsenin, Hassan Ghasemzadeh, Houman Homayoun

Research output: Contribution to journal › Article › peer-review

15 Scopus citations

Abstract

In this paper, we present a full end-to-end implementation of big data analytics applications in a heterogeneous CPU+FPGA architecture. Selecting the optimal architecture that results in the highest acceleration for big data applications requires an in-depth of each application. Thus, we develop the MapReduce implementation of K-means, K nearest neighbor, support vector machine and naive Bayes in a Hadoop Streaming environment that allows developing mapper functions in a non-Java based language suited for interfacing with FPGA-based hardware accelerating environment. We further profile various components of Hadoop MapReduce to identify candidates for hardware acceleration. We accelerate the mapper functions through hardware+software (HW+SW) co-design. Moreover, we study how various parameters at the application (size of input data), system (number of mappers running simultaneously per node and data split size), and architecture (choice of CPU core such as big vs little, e.g., Xeon vs Atom) levels affect the performance and power-efficiency benefits of Hadoop streaming hardware acceleration and the overall performance and energy-efficiency of the system. A promising speedup as well as energy-efficiency gains of up to 8.3× and 15× is achieved, respectively, in an end-to-end Hadoop implementation. Our results show that HW+SW acceleration yields significantly higher speedup on Atom server, reducing the performance gap between little and big cores after the acceleration. On the other hand, HW+SW acceleration reduces the power consumption of Xeon server more significantly, reducing the power gap between little and big cores. Our cost Analysis shows that the FPGA-accelerated Atom server yields execution times that are close to or even lower than stand-alone Xeon server for the studied applications, while reducing the server cost by more than 3×. We confirm the scalability of FPGA acceleration of MapReduce by increasing the data size on 12-node Xeon cluster and show that FPGA acceleration maintains its benefit for larger data sizes on a cluster.

Original language	English (US)
Pages (from-to)	1-17
Number of pages	17
Journal	Journal of Parallel and Distributed Computing
Volume	119
DOIs	https://doi.org/10.1016/j.jpdc.2018.02.004
State	Published - Sep 2018
Externally published	Yes

Keywords

FPGA
Hadoop
Hardware+software co-design
Machine learning
MapReduce
Zynq boards

ASJC Scopus subject areas

Software
Theoretical Computer Science
Hardware and Architecture
Computer Networks and Communications
Artificial Intelligence

Access to Document

10.1016/j.jpdc.2018.02.004

Cite this

@article{067e2c3c2071433b86dc4e921b6061c2,

title = "Energy-efficient acceleration of MapReduce applications using FPGAs",

abstract = "In this paper, we present a full end-to-end implementation of big data analytics applications in a heterogeneous CPU+FPGA architecture. Selecting the optimal architecture that results in the highest acceleration for big data applications requires an in-depth of each application. Thus, we develop the MapReduce implementation of K-means, K nearest neighbor, support vector machine and naive Bayes in a Hadoop Streaming environment that allows developing mapper functions in a non-Java based language suited for interfacing with FPGA-based hardware accelerating environment. We further profile various components of Hadoop MapReduce to identify candidates for hardware acceleration. We accelerate the mapper functions through hardware+software (HW+SW) co-design. Moreover, we study how various parameters at the application (size of input data), system (number of mappers running simultaneously per node and data split size), and architecture (choice of CPU core such as big vs little, e.g., Xeon vs Atom) levels affect the performance and power-efficiency benefits of Hadoop streaming hardware acceleration and the overall performance and energy-efficiency of the system. A promising speedup as well as energy-efficiency gains of up to 8.3× and 15× is achieved, respectively, in an end-to-end Hadoop implementation. Our results show that HW+SW acceleration yields significantly higher speedup on Atom server, reducing the performance gap between little and big cores after the acceleration. On the other hand, HW+SW acceleration reduces the power consumption of Xeon server more significantly, reducing the power gap between little and big cores. Our cost Analysis shows that the FPGA-accelerated Atom server yields execution times that are close to or even lower than stand-alone Xeon server for the studied applications, while reducing the server cost by more than 3×. We confirm the scalability of FPGA acceleration of MapReduce by increasing the data size on 12-node Xeon cluster and show that FPGA acceleration maintains its benefit for larger data sizes on a cluster.",

keywords = "FPGA, Hadoop, Hardware+software co-design, Machine learning, MapReduce, Zynq boards",

author = "Katayoun Neshatpour and Maria Malik and Avesta Sasan and Setareh Rafatirad and Tinoush Mohsenin and Hassan Ghasemzadeh and Houman Homayoun",

note = "Publisher Copyright: {\textcopyright} 2018 Elsevier Inc.",

year = "2018",

month = sep,

doi = "10.1016/j.jpdc.2018.02.004",

language = "English (US)",

volume = "119",

pages = "1--17",

journal = "Journal of Parallel and Distributed Computing",

issn = "0743-7315",

publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - Energy-efficient acceleration of MapReduce applications using FPGAs

AU - Neshatpour, Katayoun

AU - Malik, Maria

AU - Sasan, Avesta

AU - Rafatirad, Setareh

AU - Mohsenin, Tinoush

AU - Ghasemzadeh, Hassan

AU - Homayoun, Houman

PY - 2018/9

Y1 - 2018/9

N2 - In this paper, we present a full end-to-end implementation of big data analytics applications in a heterogeneous CPU+FPGA architecture. Selecting the optimal architecture that results in the highest acceleration for big data applications requires an in-depth of each application. Thus, we develop the MapReduce implementation of K-means, K nearest neighbor, support vector machine and naive Bayes in a Hadoop Streaming environment that allows developing mapper functions in a non-Java based language suited for interfacing with FPGA-based hardware accelerating environment. We further profile various components of Hadoop MapReduce to identify candidates for hardware acceleration. We accelerate the mapper functions through hardware+software (HW+SW) co-design. Moreover, we study how various parameters at the application (size of input data), system (number of mappers running simultaneously per node and data split size), and architecture (choice of CPU core such as big vs little, e.g., Xeon vs Atom) levels affect the performance and power-efficiency benefits of Hadoop streaming hardware acceleration and the overall performance and energy-efficiency of the system. A promising speedup as well as energy-efficiency gains of up to 8.3× and 15× is achieved, respectively, in an end-to-end Hadoop implementation. Our results show that HW+SW acceleration yields significantly higher speedup on Atom server, reducing the performance gap between little and big cores after the acceleration. On the other hand, HW+SW acceleration reduces the power consumption of Xeon server more significantly, reducing the power gap between little and big cores. Our cost Analysis shows that the FPGA-accelerated Atom server yields execution times that are close to or even lower than stand-alone Xeon server for the studied applications, while reducing the server cost by more than 3×. We confirm the scalability of FPGA acceleration of MapReduce by increasing the data size on 12-node Xeon cluster and show that FPGA acceleration maintains its benefit for larger data sizes on a cluster.

AB - In this paper, we present a full end-to-end implementation of big data analytics applications in a heterogeneous CPU+FPGA architecture. Selecting the optimal architecture that results in the highest acceleration for big data applications requires an in-depth of each application. Thus, we develop the MapReduce implementation of K-means, K nearest neighbor, support vector machine and naive Bayes in a Hadoop Streaming environment that allows developing mapper functions in a non-Java based language suited for interfacing with FPGA-based hardware accelerating environment. We further profile various components of Hadoop MapReduce to identify candidates for hardware acceleration. We accelerate the mapper functions through hardware+software (HW+SW) co-design. Moreover, we study how various parameters at the application (size of input data), system (number of mappers running simultaneously per node and data split size), and architecture (choice of CPU core such as big vs little, e.g., Xeon vs Atom) levels affect the performance and power-efficiency benefits of Hadoop streaming hardware acceleration and the overall performance and energy-efficiency of the system. A promising speedup as well as energy-efficiency gains of up to 8.3× and 15× is achieved, respectively, in an end-to-end Hadoop implementation. Our results show that HW+SW acceleration yields significantly higher speedup on Atom server, reducing the performance gap between little and big cores after the acceleration. On the other hand, HW+SW acceleration reduces the power consumption of Xeon server more significantly, reducing the power gap between little and big cores. Our cost Analysis shows that the FPGA-accelerated Atom server yields execution times that are close to or even lower than stand-alone Xeon server for the studied applications, while reducing the server cost by more than 3×. We confirm the scalability of FPGA acceleration of MapReduce by increasing the data size on 12-node Xeon cluster and show that FPGA acceleration maintains its benefit for larger data sizes on a cluster.

KW - FPGA

KW - Hadoop

KW - Hardware+software co-design

KW - Machine learning

KW - MapReduce

KW - Zynq boards

UR - http://www.scopus.com/inward/record.url?scp=85045264459&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85045264459&partnerID=8YFLogxK

U2 - 10.1016/j.jpdc.2018.02.004

DO - 10.1016/j.jpdc.2018.02.004

M3 - Article

AN - SCOPUS:85045264459

SN - 0743-7315

VL - 119

SP - 1

EP - 17

JO - Journal of Parallel and Distributed Computing

JF - Journal of Parallel and Distributed Computing

ER -

Energy-efficient acceleration of MapReduce applications using FPGAs

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this