TY - GEN
T1 - Comprehensive evaluation of openCL-based CNN implementations for FPGAs
AU - Tapiador-Morales, Ricardo
AU - Rios-Navarro, Antonio
AU - Linares-Barranco, Alejandro
AU - Kim, Minkyu
AU - Kadetotad, Deepak
AU - Seo, Jae-sun
N1 - Publisher Copyright:
© Springer International Publishing AG 2017.
PY - 2017
Y1 - 2017
N2 - Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementations of these deep CNN architectures are challenged with memory bottlenecks that require many convolution and fully-connected layers demanding large amount of communication for parallel computation. Multi-core CPU based solutions have demonstrated their inadequacy for this problem due to the memory wall and low parallelism. Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory. OpenCL is commonly used to describe these architectures for their execution on GPGPUs or FPGAs. FPGA design solutions are also actively being explored, which allow implementing the memory hierarchy using embedded parallel BlockRAMs. This boosts the parallel use of shared memory elements between multiple processing units, avoiding data replicability and inconsistencies. This makes FPGAs potentially powerful solutions for real-time classification of CNNs. In this paper both Altera and Xilinx adopted OpenCL co-design frameworks for pseudo-automatic development solutions are evaluated. A comprehensive evaluation and comparison for a 5-layer deep CNN is presented. Hardware resources, temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx demonstrates faster synthesis, better FPGA resource utilization and more compact boards. Altera provides multi-platforms tools, mature design community and better execution times.
AB - Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementations of these deep CNN architectures are challenged with memory bottlenecks that require many convolution and fully-connected layers demanding large amount of communication for parallel computation. Multi-core CPU based solutions have demonstrated their inadequacy for this problem due to the memory wall and low parallelism. Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory. OpenCL is commonly used to describe these architectures for their execution on GPGPUs or FPGAs. FPGA design solutions are also actively being explored, which allow implementing the memory hierarchy using embedded parallel BlockRAMs. This boosts the parallel use of shared memory elements between multiple processing units, avoiding data replicability and inconsistencies. This makes FPGAs potentially powerful solutions for real-time classification of CNNs. In this paper both Altera and Xilinx adopted OpenCL co-design frameworks for pseudo-automatic development solutions are evaluated. A comprehensive evaluation and comparison for a 5-layer deep CNN is presented. Hardware resources, temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx demonstrates faster synthesis, better FPGA resource utilization and more compact boards. Altera provides multi-platforms tools, mature design community and better execution times.
KW - Altera
KW - Caffe
KW - Convolutional Neural Network
KW - Deep learning
KW - FPGA
KW - Hardware acceleration
KW - OpenCL
KW - Xilinx
UR - http://www.scopus.com/inward/record.url?scp=85020881750&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85020881750&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-59147-6_24
DO - 10.1007/978-3-319-59147-6_24
M3 - Conference contribution
AN - SCOPUS:85020881750
SN - 9783319591469
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 271
EP - 282
BT - Advances in Computational Intelligence - 14th International Work-Conference on Artificial Neural Networks, IWANN 2017, Proceedings
A2 - Rojas, Ignacio
A2 - Catala, Andreu
A2 - Joya, Gonzalo
PB - Springer Verlag
T2 - 16th International Conference on Artificial Intelligence and Soft Computing, ICAISC 2017
Y2 - 11 June 2017 through 15 June 2017
ER -