This work presents an efficient hardware accelerator design of deep residual learning algorithms, which have shown superior image recognition accuracy (>90% top-5 accuracy on ImageNet database). Two key objectives of the acceleration strategy are to (1) maximize resource utilization and minimize data movements, and (2) employ scalable and reusable computing primitives to optimize physical design under hardware constraints. Furthermore, we present techniques for efficient integration and communication of these primitives in deep residual convolutional neural networks (CNNs) that exhibit complex, non-uniform layer connections. The proposed hardware accelerator efficiently implements state-of-the-art ResNet-50/152 algorithms on Arria-10 FPGA, demonstrating 285.1/315.5 GOPS of throughput and 27.2/71.7 ms of latency, respectively.

Original languageEnglish (US)
Title of host publicationIEEE International Symposium on Circuits and Systems
Subtitle of host publicationFrom Dreams to Innovation, ISCAS 2017 - Conference Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781467368520
StatePublished - Sep 25 2017
Event50th IEEE International Symposium on Circuits and Systems, ISCAS 2017 - Baltimore, United States
Duration: May 28 2017May 31 2017

Publication series

NameProceedings - IEEE International Symposium on Circuits and Systems
ISSN (Print)0271-4310


Other50th IEEE International Symposium on Circuits and Systems, ISCAS 2017
Country/TerritoryUnited States


  • Convolutional neural networks
  • Deep learning
  • Deep residual networks
  • FPGA
  • hardware acceleration

ASJC Scopus subject areas

  • Electrical and Electronic Engineering


Dive into the research topics of 'End-to-end scalable FPGA accelerator for deep residual networks'. Together they form a unique fingerprint.

Cite this