Toward Multi-FPGA Acceleration of the Neural Networks

Saman Biookaghazadeh, Pravin Kumar Ravi, Ming Zhao

Research output: Contribution to journalArticlepeer-review

11 Scopus citations


High-throughput and low-latency Convolutional Neural Network (CNN) inference is increasingly important for many cloud- and edge-computing applications. FPGA-based acceleration of CNN inference has demonstrated various benefits compared to other high-performance devices such as GPGPUs. Current FPGA CNN-acceleration solutions are based on a single FPGA design, which are limited by the available resources on an FPGA. In addition, they can only accelerate conventional 2D neural networks. To address these limitations, we present a generic multi-FPGA solution, written in OpenCL, which can accelerate more complex CNNs (e.g., C3D CNN) and achieve a near linear speedup with respect to the available single-FPGA solutions. The design is built upon the Intel Deep Learning Accelerator architecture, with three extensions. First, it includes updates for better area efficiency (up to 25%) and higher performance (up to 24%). Second, it supports 3D convolutions for more challenging applications such as video learning. Third, it supports multi-FPGA communication for higher inference throughput. The results show that utilizing multiple FPGAs can linearly increase the overall bandwidth while maintaining the same end-to-end latency. In addition, the design can outperform other FPGA 2D accelerators by up to 8.4 times and 3D accelerators by up to 1.7 times.

Original languageEnglish (US)
Article number25
JournalACM Journal on Emerging Technologies in Computing Systems
Issue number2
StatePublished - Apr 5 2021


  • FPGA
  • distributed systems
  • neural networks

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Electrical and Electronic Engineering


Dive into the research topics of 'Toward Multi-FPGA Acceleration of the Neural Networks'. Together they form a unique fingerprint.

Cite this