End-to-end scalable FPGA accelerator for deep residual networks

Yufei Ma; Minkyu Kim; Yu Cao; Sarma Vrudhula; Jae-sun Seo

doi:10.1109/ISCAS.2017.8050344

End-to-end scalable FPGA accelerator for deep residual networks

Yufei Ma, Minkyu Kim, Yu Cao, Sarma Vrudhula, Jae-sun Seo

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

46 Scopus citations

Abstract

This work presents an efficient hardware accelerator design of deep residual learning algorithms, which have shown superior image recognition accuracy (>90% top-5 accuracy on ImageNet database). Two key objectives of the acceleration strategy are to (1) maximize resource utilization and minimize data movements, and (2) employ scalable and reusable computing primitives to optimize physical design under hardware constraints. Furthermore, we present techniques for efficient integration and communication of these primitives in deep residual convolutional neural networks (CNNs) that exhibit complex, non-uniform layer connections. The proposed hardware accelerator efficiently implements state-of-the-art ResNet-50/152 algorithms on Arria-10 FPGA, demonstrating 285.1/315.5 GOPS of throughput and 27.2/71.7 ms of latency, respectively.

Original language	English (US)
Title of host publication	IEEE International Symposium on Circuits and Systems
Subtitle of host publication	From Dreams to Innovation, ISCAS 2017 - Conference Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9781467368520
DOIs	https://doi.org/10.1109/ISCAS.2017.8050344
State	Published - Sep 25 2017
Event	50th IEEE International Symposium on Circuits and Systems, ISCAS 2017 - Baltimore, United States Duration: May 28 2017 → May 31 2017

Publication series

Name	Proceedings - IEEE International Symposium on Circuits and Systems
ISSN (Print)	0271-4310

Other

Other	50th IEEE International Symposium on Circuits and Systems, ISCAS 2017
Country/Territory	United States
City	Baltimore
Period	5/28/17 → 5/31/17

Keywords

Convolutional neural networks
Deep learning
Deep residual networks
FPGA
hardware acceleration

ASJC Scopus subject areas

Electrical and Electronic Engineering

Access to Document

10.1109/ISCAS.2017.8050344

Cite this

Ma, Y., Kim, M., Cao, Y., Vrudhula, S., & Seo, J. (2017). End-to-end scalable FPGA accelerator for deep residual networks. In IEEE International Symposium on Circuits and Systems: From Dreams to Innovation, ISCAS 2017 - Conference Proceedings Article 8050344 (Proceedings - IEEE International Symposium on Circuits and Systems). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ISCAS.2017.8050344

End-to-end scalable FPGA accelerator for deep residual networks. / Ma, Yufei; Kim, Minkyu; Cao, Yu et al.
IEEE International Symposium on Circuits and Systems: From Dreams to Innovation, ISCAS 2017 - Conference Proceedings. Institute of Electrical and Electronics Engineers Inc., 2017. 8050344 (Proceedings - IEEE International Symposium on Circuits and Systems).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Ma, Y, Kim, M, Cao, Y, Vrudhula, S & Seo, J 2017, End-to-end scalable FPGA accelerator for deep residual networks. in IEEE International Symposium on Circuits and Systems: From Dreams to Innovation, ISCAS 2017 - Conference Proceedings., 8050344, Proceedings - IEEE International Symposium on Circuits and Systems, Institute of Electrical and Electronics Engineers Inc., 50th IEEE International Symposium on Circuits and Systems, ISCAS 2017, Baltimore, United States, 5/28/17. https://doi.org/10.1109/ISCAS.2017.8050344

Ma Y, Kim M, Cao Y, Vrudhula S, Seo J. End-to-end scalable FPGA accelerator for deep residual networks. In IEEE International Symposium on Circuits and Systems: From Dreams to Innovation, ISCAS 2017 - Conference Proceedings. Institute of Electrical and Electronics Engineers Inc. 2017. 8050344. (Proceedings - IEEE International Symposium on Circuits and Systems). doi: 10.1109/ISCAS.2017.8050344

@inproceedings{7cfcd4164683412d9860f1003c5efb94,

title = "End-to-end scalable FPGA accelerator for deep residual networks",

abstract = "This work presents an efficient hardware accelerator design of deep residual learning algorithms, which have shown superior image recognition accuracy (>90% top-5 accuracy on ImageNet database). Two key objectives of the acceleration strategy are to (1) maximize resource utilization and minimize data movements, and (2) employ scalable and reusable computing primitives to optimize physical design under hardware constraints. Furthermore, we present techniques for efficient integration and communication of these primitives in deep residual convolutional neural networks (CNNs) that exhibit complex, non-uniform layer connections. The proposed hardware accelerator efficiently implements state-of-the-art ResNet-50/152 algorithms on Arria-10 FPGA, demonstrating 285.1/315.5 GOPS of throughput and 27.2/71.7 ms of latency, respectively.",

keywords = "Convolutional neural networks, Deep learning, Deep residual networks, FPGA, hardware acceleration",

author = "Yufei Ma and Minkyu Kim and Yu Cao and Sarma Vrudhula and Jae-sun Seo",

note = "Funding Information: VII. ACKNOWLEDGEMENT This work was supported in part by the NSF I/UCRC Center for Embedded Systems through NSF grant 1361926 and 1535669, and Samsung Advanced Institute of Technology. Publisher Copyright: {\textcopyright} 2017 IEEE.; 50th IEEE International Symposium on Circuits and Systems, ISCAS 2017 ; Conference date: 28-05-2017 Through 31-05-2017",

year = "2017",

month = sep,

day = "25",

doi = "10.1109/ISCAS.2017.8050344",

language = "English (US)",

series = "Proceedings - IEEE International Symposium on Circuits and Systems",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "IEEE International Symposium on Circuits and Systems",

}

TY - GEN

T1 - End-to-end scalable FPGA accelerator for deep residual networks

AU - Ma, Yufei

AU - Kim, Minkyu

AU - Cao, Yu

AU - Vrudhula, Sarma

AU - Seo, Jae-sun

N1 - Funding Information: VII. ACKNOWLEDGEMENT This work was supported in part by the NSF I/UCRC Center for Embedded Systems through NSF grant 1361926 and 1535669, and Samsung Advanced Institute of Technology. Publisher Copyright: © 2017 IEEE.

PY - 2017/9/25

Y1 - 2017/9/25

N2 - This work presents an efficient hardware accelerator design of deep residual learning algorithms, which have shown superior image recognition accuracy (>90% top-5 accuracy on ImageNet database). Two key objectives of the acceleration strategy are to (1) maximize resource utilization and minimize data movements, and (2) employ scalable and reusable computing primitives to optimize physical design under hardware constraints. Furthermore, we present techniques for efficient integration and communication of these primitives in deep residual convolutional neural networks (CNNs) that exhibit complex, non-uniform layer connections. The proposed hardware accelerator efficiently implements state-of-the-art ResNet-50/152 algorithms on Arria-10 FPGA, demonstrating 285.1/315.5 GOPS of throughput and 27.2/71.7 ms of latency, respectively.

AB - This work presents an efficient hardware accelerator design of deep residual learning algorithms, which have shown superior image recognition accuracy (>90% top-5 accuracy on ImageNet database). Two key objectives of the acceleration strategy are to (1) maximize resource utilization and minimize data movements, and (2) employ scalable and reusable computing primitives to optimize physical design under hardware constraints. Furthermore, we present techniques for efficient integration and communication of these primitives in deep residual convolutional neural networks (CNNs) that exhibit complex, non-uniform layer connections. The proposed hardware accelerator efficiently implements state-of-the-art ResNet-50/152 algorithms on Arria-10 FPGA, demonstrating 285.1/315.5 GOPS of throughput and 27.2/71.7 ms of latency, respectively.

KW - Convolutional neural networks

KW - Deep learning

KW - Deep residual networks

KW - FPGA

KW - hardware acceleration

UR - http://www.scopus.com/inward/record.url?scp=85032694855&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85032694855&partnerID=8YFLogxK

U2 - 10.1109/ISCAS.2017.8050344

DO - 10.1109/ISCAS.2017.8050344

M3 - Conference contribution

AN - SCOPUS:85032694855

T3 - Proceedings - IEEE International Symposium on Circuits and Systems

BT - IEEE International Symposium on Circuits and Systems

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 50th IEEE International Symposium on Circuits and Systems, ISCAS 2017

Y2 - 28 May 2017 through 31 May 2017

ER -

End-to-end scalable FPGA accelerator for deep residual networks

Abstract

Publication series

Other

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this