Toward Multi-FPGA Acceleration of the Neural Networks

Saman Biookaghazadeh; Pravin Kumar Ravi; Ming Zhao

doi:10.1145/3432816

Toward Multi-FPGA Acceleration of the Neural Networks

Saman Biookaghazadeh, Pravin Kumar Ravi, Ming Zhao

Research output: Contribution to journal › Article › peer-review

18 Scopus citations

Abstract

High-throughput and low-latency Convolutional Neural Network (CNN) inference is increasingly important for many cloud- and edge-computing applications. FPGA-based acceleration of CNN inference has demonstrated various benefits compared to other high-performance devices such as GPGPUs. Current FPGA CNN-acceleration solutions are based on a single FPGA design, which are limited by the available resources on an FPGA. In addition, they can only accelerate conventional 2D neural networks. To address these limitations, we present a generic multi-FPGA solution, written in OpenCL, which can accelerate more complex CNNs (e.g., C3D CNN) and achieve a near linear speedup with respect to the available single-FPGA solutions. The design is built upon the Intel Deep Learning Accelerator architecture, with three extensions. First, it includes updates for better area efficiency (up to 25%) and higher performance (up to 24%). Second, it supports 3D convolutions for more challenging applications such as video learning. Third, it supports multi-FPGA communication for higher inference throughput. The results show that utilizing multiple FPGAs can linearly increase the overall bandwidth while maintaining the same end-to-end latency. In addition, the design can outperform other FPGA 2D accelerators by up to 8.4 times and 3D accelerators by up to 1.7 times.

Original language	English (US)
Article number	25
Journal	ACM Journal on Emerging Technologies in Computing Systems
Volume	17
Issue number	2
DOIs	https://doi.org/10.1145/3432816
State	Published - Apr 5 2021

Keywords

FPGA
distributed systems
neural networks

ASJC Scopus subject areas

Software
Hardware and Architecture
Electrical and Electronic Engineering

Access to Document

10.1145/3432816

Cite this

@article{073f88204a1449c69e8358e9c1300e2a,

title = "Toward Multi-FPGA Acceleration of the Neural Networks",

abstract = "High-throughput and low-latency Convolutional Neural Network (CNN) inference is increasingly important for many cloud- and edge-computing applications. FPGA-based acceleration of CNN inference has demonstrated various benefits compared to other high-performance devices such as GPGPUs. Current FPGA CNN-acceleration solutions are based on a single FPGA design, which are limited by the available resources on an FPGA. In addition, they can only accelerate conventional 2D neural networks. To address these limitations, we present a generic multi-FPGA solution, written in OpenCL, which can accelerate more complex CNNs (e.g., C3D CNN) and achieve a near linear speedup with respect to the available single-FPGA solutions. The design is built upon the Intel Deep Learning Accelerator architecture, with three extensions. First, it includes updates for better area efficiency (up to 25%) and higher performance (up to 24%). Second, it supports 3D convolutions for more challenging applications such as video learning. Third, it supports multi-FPGA communication for higher inference throughput. The results show that utilizing multiple FPGAs can linearly increase the overall bandwidth while maintaining the same end-to-end latency. In addition, the design can outperform other FPGA 2D accelerators by up to 8.4 times and 3D accelerators by up to 1.7 times.",

keywords = "FPGA, distributed systems, neural networks",

author = "Saman Biookaghazadeh and Ravi, {Pravin Kumar} and Ming Zhao",

note = "Funding Information: This work was supported by National Science Foundation awards CNS-1619653, CNS-1562837, CNS-1629888, and CNS-1955593. Authors{\textquoteright} address: S. Biookaghazadeh, P. K. Ravi, and M. Zhao, Arizona State University, School of Computing, Informatics, and Decision Systems Engineering, 699 S. Mill Avenue, Tempe, AZ 85281; emails: {sbiookag, pravi8, mingzhao}@asu.edu. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. {\textcopyright} 2021 Association for Computing Machinery. 1550-4832/2021/04-ART25 $15.00 https://doi.org/10.1145/3432816 Publisher Copyright: {\textcopyright} 2021 ACM.",

year = "2021",

month = apr,

day = "5",

doi = "10.1145/3432816",

language = "English (US)",

volume = "17",

journal = "ACM Journal on Emerging Technologies in Computing Systems",

issn = "1550-4832",

publisher = "Association for Computing Machinery (ACM)",

number = "2",

}

TY - JOUR

T1 - Toward Multi-FPGA Acceleration of the Neural Networks

AU - Biookaghazadeh, Saman

AU - Ravi, Pravin Kumar

AU - Zhao, Ming

N1 - Funding Information: This work was supported by National Science Foundation awards CNS-1619653, CNS-1562837, CNS-1629888, and CNS-1955593. Authors’ address: S. Biookaghazadeh, P. K. Ravi, and M. Zhao, Arizona State University, School of Computing, Informatics, and Decision Systems Engineering, 699 S. Mill Avenue, Tempe, AZ 85281; emails: {sbiookag, pravi8, mingzhao}@asu.edu. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2021 Association for Computing Machinery. 1550-4832/2021/04-ART25 $15.00 https://doi.org/10.1145/3432816 Publisher Copyright: © 2021 ACM.

PY - 2021/4/5

Y1 - 2021/4/5

N2 - High-throughput and low-latency Convolutional Neural Network (CNN) inference is increasingly important for many cloud- and edge-computing applications. FPGA-based acceleration of CNN inference has demonstrated various benefits compared to other high-performance devices such as GPGPUs. Current FPGA CNN-acceleration solutions are based on a single FPGA design, which are limited by the available resources on an FPGA. In addition, they can only accelerate conventional 2D neural networks. To address these limitations, we present a generic multi-FPGA solution, written in OpenCL, which can accelerate more complex CNNs (e.g., C3D CNN) and achieve a near linear speedup with respect to the available single-FPGA solutions. The design is built upon the Intel Deep Learning Accelerator architecture, with three extensions. First, it includes updates for better area efficiency (up to 25%) and higher performance (up to 24%). Second, it supports 3D convolutions for more challenging applications such as video learning. Third, it supports multi-FPGA communication for higher inference throughput. The results show that utilizing multiple FPGAs can linearly increase the overall bandwidth while maintaining the same end-to-end latency. In addition, the design can outperform other FPGA 2D accelerators by up to 8.4 times and 3D accelerators by up to 1.7 times.

AB - High-throughput and low-latency Convolutional Neural Network (CNN) inference is increasingly important for many cloud- and edge-computing applications. FPGA-based acceleration of CNN inference has demonstrated various benefits compared to other high-performance devices such as GPGPUs. Current FPGA CNN-acceleration solutions are based on a single FPGA design, which are limited by the available resources on an FPGA. In addition, they can only accelerate conventional 2D neural networks. To address these limitations, we present a generic multi-FPGA solution, written in OpenCL, which can accelerate more complex CNNs (e.g., C3D CNN) and achieve a near linear speedup with respect to the available single-FPGA solutions. The design is built upon the Intel Deep Learning Accelerator architecture, with three extensions. First, it includes updates for better area efficiency (up to 25%) and higher performance (up to 24%). Second, it supports 3D convolutions for more challenging applications such as video learning. Third, it supports multi-FPGA communication for higher inference throughput. The results show that utilizing multiple FPGAs can linearly increase the overall bandwidth while maintaining the same end-to-end latency. In addition, the design can outperform other FPGA 2D accelerators by up to 8.4 times and 3D accelerators by up to 1.7 times.

KW - FPGA

KW - distributed systems

KW - neural networks

UR - http://www.scopus.com/inward/record.url?scp=85105214298&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85105214298&partnerID=8YFLogxK

U2 - 10.1145/3432816

DO - 10.1145/3432816

M3 - Article

AN - SCOPUS:85105214298

SN - 1550-4832

VL - 17

JO - ACM Journal on Emerging Technologies in Computing Systems

JF - ACM Journal on Emerging Technologies in Computing Systems

IS - 2

M1 - 25

ER -

Toward Multi-FPGA Acceleration of the Neural Networks

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this