TY - JOUR
T1 - Performance modeling for CNN inference accelerators on FPGA
AU - Ma, Yufei
AU - Cao, Yu
AU - Vrudhula, Sarma
AU - Seo, Jae Sun
N1 - Funding Information:
Manuscript received August 27, 2018; revised December 6, 2018; accepted January 17, 2019. Date of publication February 4, 2019; date of current version March 18, 2020. This work was supported in part by the NSF I/UCRC Center for Embedded Systems through NSF under Grant 1230401, Grant 1237856, Grant 1701241, Grant 1361926, Grant 1535669, Grant 1652866, and Grant 1715443, in part by the Intel Labs, and in part by the Center for Brain-Inspired Computing (C-BRIC), one of six centers in JUMP, an SRC program sponsored by DARPA. This paper was recommended by Associate Editor Y. Wang. (Corresponding author: Yufei Ma.) Y. Ma, Y. Cao, and J.-S. Seo are with the School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85287 USA (e-mail: yufeima@asu.edu; yu.cao@asu.edu; jaesun.seo@asu.edu).
Publisher Copyright:
© 1982-2012 IEEE.
PY - 2020/4
Y1 - 2020/4
N2 - The recently reported successes of convolutional neural networks (CNNs) in many areas have generated wide interest in the development of field-programmable gate array (FPGA)-based accelerators. To achieve high performance and energy efficiency, an FPGA-based accelerator must fully utilize the limited computation resources and minimize the data communication and memory access, both of which are impacted and constrained by a variety of design parameters, e.g., the degree and dimension of parallelism, the size of on-chip buffers, the bandwidth of the external memory, and many more. The large design space of the accelerator makes it impractical to search for the optimal design in the implementation phase. To address this problem, a performance model is described to estimate the performance and resource utilization of an FPGA implementation. By this means, the performance bottleneck and design bound can be identified and the optimal design option can be explored early in the design phase. The proposed performance model is validated using a variety of CNN algorithms comparing the results with on-board test results on two different FPGAs.
AB - The recently reported successes of convolutional neural networks (CNNs) in many areas have generated wide interest in the development of field-programmable gate array (FPGA)-based accelerators. To achieve high performance and energy efficiency, an FPGA-based accelerator must fully utilize the limited computation resources and minimize the data communication and memory access, both of which are impacted and constrained by a variety of design parameters, e.g., the degree and dimension of parallelism, the size of on-chip buffers, the bandwidth of the external memory, and many more. The large design space of the accelerator makes it impractical to search for the optimal design in the implementation phase. To address this problem, a performance model is described to estimate the performance and resource utilization of an FPGA implementation. By this means, the performance bottleneck and design bound can be identified and the optimal design option can be explored early in the design phase. The proposed performance model is validated using a variety of CNN algorithms comparing the results with on-board test results on two different FPGAs.
KW - Analytical modeling
KW - convolutional neural networks (CNNs)
KW - field-programmable gate array (FPGA)
UR - http://www.scopus.com/inward/record.url?scp=85082390643&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85082390643&partnerID=8YFLogxK
U2 - 10.1109/TCAD.2019.2897634
DO - 10.1109/TCAD.2019.2897634
M3 - Article
AN - SCOPUS:85082390643
VL - 39
SP - 843
EP - 856
JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
SN - 0278-0070
IS - 4
M1 - 8634939
ER -