TY - JOUR
T1 - Toward Multi-FPGA Acceleration of the Neural Networks
AU - Biookaghazadeh, Saman
AU - Ravi, Pravin Kumar
AU - Zhao, Ming
N1 - Funding Information:
This work was supported by National Science Foundation awards CNS-1619653, CNS-1562837, CNS-1629888, and CNS-1955593. Authors’ address: S. Biookaghazadeh, P. K. Ravi, and M. Zhao, Arizona State University, School of Computing, Informatics, and Decision Systems Engineering, 699 S. Mill Avenue, Tempe, AZ 85281; emails: {sbiookag, pravi8, mingzhao}@asu.edu. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2021 Association for Computing Machinery. 1550-4832/2021/04-ART25 $15.00 https://doi.org/10.1145/3432816
Publisher Copyright:
© 2021 ACM.
PY - 2021/4/5
Y1 - 2021/4/5
N2 - High-throughput and low-latency Convolutional Neural Network (CNN) inference is increasingly important for many cloud- and edge-computing applications. FPGA-based acceleration of CNN inference has demonstrated various benefits compared to other high-performance devices such as GPGPUs. Current FPGA CNN-acceleration solutions are based on a single FPGA design, which are limited by the available resources on an FPGA. In addition, they can only accelerate conventional 2D neural networks. To address these limitations, we present a generic multi-FPGA solution, written in OpenCL, which can accelerate more complex CNNs (e.g., C3D CNN) and achieve a near linear speedup with respect to the available single-FPGA solutions. The design is built upon the Intel Deep Learning Accelerator architecture, with three extensions. First, it includes updates for better area efficiency (up to 25%) and higher performance (up to 24%). Second, it supports 3D convolutions for more challenging applications such as video learning. Third, it supports multi-FPGA communication for higher inference throughput. The results show that utilizing multiple FPGAs can linearly increase the overall bandwidth while maintaining the same end-to-end latency. In addition, the design can outperform other FPGA 2D accelerators by up to 8.4 times and 3D accelerators by up to 1.7 times.
AB - High-throughput and low-latency Convolutional Neural Network (CNN) inference is increasingly important for many cloud- and edge-computing applications. FPGA-based acceleration of CNN inference has demonstrated various benefits compared to other high-performance devices such as GPGPUs. Current FPGA CNN-acceleration solutions are based on a single FPGA design, which are limited by the available resources on an FPGA. In addition, they can only accelerate conventional 2D neural networks. To address these limitations, we present a generic multi-FPGA solution, written in OpenCL, which can accelerate more complex CNNs (e.g., C3D CNN) and achieve a near linear speedup with respect to the available single-FPGA solutions. The design is built upon the Intel Deep Learning Accelerator architecture, with three extensions. First, it includes updates for better area efficiency (up to 25%) and higher performance (up to 24%). Second, it supports 3D convolutions for more challenging applications such as video learning. Third, it supports multi-FPGA communication for higher inference throughput. The results show that utilizing multiple FPGAs can linearly increase the overall bandwidth while maintaining the same end-to-end latency. In addition, the design can outperform other FPGA 2D accelerators by up to 8.4 times and 3D accelerators by up to 1.7 times.
KW - FPGA
KW - distributed systems
KW - neural networks
UR - http://www.scopus.com/inward/record.url?scp=85105214298&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85105214298&partnerID=8YFLogxK
U2 - 10.1145/3432816
DO - 10.1145/3432816
M3 - Article
AN - SCOPUS:85105214298
SN - 1550-4832
VL - 17
JO - ACM Journal on Emerging Technologies in Computing Systems
JF - ACM Journal on Emerging Technologies in Computing Systems
IS - 2
M1 - 25
ER -