TY - GEN
T1 - Throughput-optimized openCL-based FPGA accelerator for large-scale convolutional neural networks
AU - Suda, Naveen
AU - Chandra, Vikas
AU - Dasika, Ganesh
AU - Mohanty, Abinash
AU - Ma, Yufei
AU - Vrudhula, Sarma
AU - Seo, Jae-sun
AU - Cao, Yu
N1 - Publisher Copyright:
© 2016 ACM.
PY - 2016/2/21
Y1 - 2016/2/21
N2 - Convolutional Neural Networks (CNNs) have gained popularity in many computer vision applications such as image classification, face detection, and video analysis, because of their ability to train and classify with high accuracy. Due to multiple convolution and fully-connected layers that are compute- /memory-intensive, it is difficult to perform real-time classification with low power consumption on today's computing systems. FPGAS have been widely explored as hardware accelerators for CNNs because of their reconfigurability and energy efficiency, as well as fast turn-around-time, especially with high-level synthesis methodologies. Previous FPGA-based CNN accelerators, however, typically implemented generic accelerators agnostic to the CNN configuration, where the reconfigurable capabilities of FPGAS are not fully leveraged to maximize the overall system throughput. In this work, we present a systematic design space exploration methodology to maximize the throughput of an OpenCL-based FPGA accelerator for a given CNN model, considering the FPGA resource constraints such as on-chip memory, registers, computational resources and external memory bandwidth. The proposed methodology is demonstrated by optimizing two representative large-scale CNNs, AlexNet and VGG, on two Altera Stratix-V FPGA platforms, DE5-Net and P395-D8 boards, which have different hardware resources. We achieve a peak performance of 136.5 GOPS for convolution operation, and 117.8 GOPS for the entire VGG network that performs ImageNet classification on P395-D8 board.
AB - Convolutional Neural Networks (CNNs) have gained popularity in many computer vision applications such as image classification, face detection, and video analysis, because of their ability to train and classify with high accuracy. Due to multiple convolution and fully-connected layers that are compute- /memory-intensive, it is difficult to perform real-time classification with low power consumption on today's computing systems. FPGAS have been widely explored as hardware accelerators for CNNs because of their reconfigurability and energy efficiency, as well as fast turn-around-time, especially with high-level synthesis methodologies. Previous FPGA-based CNN accelerators, however, typically implemented generic accelerators agnostic to the CNN configuration, where the reconfigurable capabilities of FPGAS are not fully leveraged to maximize the overall system throughput. In this work, we present a systematic design space exploration methodology to maximize the throughput of an OpenCL-based FPGA accelerator for a given CNN model, considering the FPGA resource constraints such as on-chip memory, registers, computational resources and external memory bandwidth. The proposed methodology is demonstrated by optimizing two representative large-scale CNNs, AlexNet and VGG, on two Altera Stratix-V FPGA platforms, DE5-Net and P395-D8 boards, which have different hardware resources. We achieve a peak performance of 136.5 GOPS for convolution operation, and 117.8 GOPS for the entire VGG network that performs ImageNet classification on P395-D8 board.
KW - Convolutional neural networks
KW - FPGA
KW - OpenCL
KW - Optimization
UR - http://www.scopus.com/inward/record.url?scp=84966471227&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84966471227&partnerID=8YFLogxK
U2 - 10.1145/2847263.2847276
DO - 10.1145/2847263.2847276
M3 - Conference contribution
AN - SCOPUS:84966471227
T3 - FPGA 2016 - Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
SP - 16
EP - 25
BT - FPGA 2016 - Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
PB - Association for Computing Machinery, Inc
T2 - 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 2016
Y2 - 21 February 2016 through 23 February 2016
ER -