Comprehensive evaluation of openCL-based CNN implementations for FPGAs

Ricardo Tapiador-Morales, Antonio Rios-Navarro, Alejandro Linares-Barranco, Minkyu Kim, Deepak Kadetotad, Jae-sun Seo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementations of these deep CNN architectures are challenged with memory bottlenecks that require many convolution and fully-connected layers demanding large amount of communication for parallel computation. Multi-core CPU based solutions have demonstrated their inadequacy for this problem due to the memory wall and low parallelism. Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory. OpenCL is commonly used to describe these architectures for their execution on GPGPUs or FPGAs. FPGA design solutions are also actively being explored, which allow implementing the memory hierarchy using embedded parallel BlockRAMs. This boosts the parallel use of shared memory elements between multiple processing units, avoiding data replicability and inconsistencies. This makes FPGAs potentially powerful solutions for real-time classification of CNNs. In this paper both Altera and Xilinx adopted OpenCL co-design frameworks for pseudo-automatic development solutions are evaluated. A comprehensive evaluation and comparison for a 5-layer deep CNN is presented. Hardware resources, temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx demonstrates faster synthesis, better FPGA resource utilization and more compact boards. Altera provides multi-platforms tools, mature design community and better execution times.

Original languageEnglish (US)
Title of host publicationAdvances in Computational Intelligence - 14th International Work-Conference on Artificial Neural Networks, IWANN 2017, Proceedings
PublisherSpringer Verlag
Pages271-282
Number of pages12
Volume10306 LNCS
ISBN (Print)9783319591469
DOIs
StatePublished - 2017
Event16th International Conference on Artificial Intelligence and Soft Computing, ICAISC 2017 - Zakopane, Poland
Duration: Jun 11 2017Jun 15 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10306 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other16th International Conference on Artificial Intelligence and Soft Computing, ICAISC 2017
CountryPoland
CityZakopane
Period6/11/176/15/17

Fingerprint

Comprehensive Evaluation
Field Programmable Gate Array
Field programmable gate arrays (FPGA)
Neural Networks
Neural networks
Data storage equipment
Inconsistency
GPGPU
Co-design
Visual Cortex
Memory Hierarchy
Resources
Many-core
Hardware Implementation
Parallel Computation
Network Architecture
Shared Memory
Hierarchical Structure
High Power
Cache

Keywords

  • Altera
  • Caffe
  • Convolutional Neural Network
  • Deep learning
  • FPGA
  • Hardware acceleration
  • OpenCL
  • Xilinx

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Tapiador-Morales, R., Rios-Navarro, A., Linares-Barranco, A., Kim, M., Kadetotad, D., & Seo, J. (2017). Comprehensive evaluation of openCL-based CNN implementations for FPGAs. In Advances in Computational Intelligence - 14th International Work-Conference on Artificial Neural Networks, IWANN 2017, Proceedings (Vol. 10306 LNCS, pp. 271-282). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10306 LNCS). Springer Verlag. https://doi.org/10.1007/978-3-319-59147-6_24

Comprehensive evaluation of openCL-based CNN implementations for FPGAs. / Tapiador-Morales, Ricardo; Rios-Navarro, Antonio; Linares-Barranco, Alejandro; Kim, Minkyu; Kadetotad, Deepak; Seo, Jae-sun.

Advances in Computational Intelligence - 14th International Work-Conference on Artificial Neural Networks, IWANN 2017, Proceedings. Vol. 10306 LNCS Springer Verlag, 2017. p. 271-282 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 10306 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Tapiador-Morales, R, Rios-Navarro, A, Linares-Barranco, A, Kim, M, Kadetotad, D & Seo, J 2017, Comprehensive evaluation of openCL-based CNN implementations for FPGAs. in Advances in Computational Intelligence - 14th International Work-Conference on Artificial Neural Networks, IWANN 2017, Proceedings. vol. 10306 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10306 LNCS, Springer Verlag, pp. 271-282, 16th International Conference on Artificial Intelligence and Soft Computing, ICAISC 2017, Zakopane, Poland, 6/11/17. https://doi.org/10.1007/978-3-319-59147-6_24
Tapiador-Morales R, Rios-Navarro A, Linares-Barranco A, Kim M, Kadetotad D, Seo J. Comprehensive evaluation of openCL-based CNN implementations for FPGAs. In Advances in Computational Intelligence - 14th International Work-Conference on Artificial Neural Networks, IWANN 2017, Proceedings. Vol. 10306 LNCS. Springer Verlag. 2017. p. 271-282. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-59147-6_24
Tapiador-Morales, Ricardo ; Rios-Navarro, Antonio ; Linares-Barranco, Alejandro ; Kim, Minkyu ; Kadetotad, Deepak ; Seo, Jae-sun. / Comprehensive evaluation of openCL-based CNN implementations for FPGAs. Advances in Computational Intelligence - 14th International Work-Conference on Artificial Neural Networks, IWANN 2017, Proceedings. Vol. 10306 LNCS Springer Verlag, 2017. pp. 271-282 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{5fe49f3a3e9d4fb3957242a6c0a687fd,
title = "Comprehensive evaluation of openCL-based CNN implementations for FPGAs",
abstract = "Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementations of these deep CNN architectures are challenged with memory bottlenecks that require many convolution and fully-connected layers demanding large amount of communication for parallel computation. Multi-core CPU based solutions have demonstrated their inadequacy for this problem due to the memory wall and low parallelism. Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory. OpenCL is commonly used to describe these architectures for their execution on GPGPUs or FPGAs. FPGA design solutions are also actively being explored, which allow implementing the memory hierarchy using embedded parallel BlockRAMs. This boosts the parallel use of shared memory elements between multiple processing units, avoiding data replicability and inconsistencies. This makes FPGAs potentially powerful solutions for real-time classification of CNNs. In this paper both Altera and Xilinx adopted OpenCL co-design frameworks for pseudo-automatic development solutions are evaluated. A comprehensive evaluation and comparison for a 5-layer deep CNN is presented. Hardware resources, temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx demonstrates faster synthesis, better FPGA resource utilization and more compact boards. Altera provides multi-platforms tools, mature design community and better execution times.",
keywords = "Altera, Caffe, Convolutional Neural Network, Deep learning, FPGA, Hardware acceleration, OpenCL, Xilinx",
author = "Ricardo Tapiador-Morales and Antonio Rios-Navarro and Alejandro Linares-Barranco and Minkyu Kim and Deepak Kadetotad and Jae-sun Seo",
year = "2017",
doi = "10.1007/978-3-319-59147-6_24",
language = "English (US)",
isbn = "9783319591469",
volume = "10306 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "271--282",
booktitle = "Advances in Computational Intelligence - 14th International Work-Conference on Artificial Neural Networks, IWANN 2017, Proceedings",
address = "Germany",

}

TY - GEN

T1 - Comprehensive evaluation of openCL-based CNN implementations for FPGAs

AU - Tapiador-Morales, Ricardo

AU - Rios-Navarro, Antonio

AU - Linares-Barranco, Alejandro

AU - Kim, Minkyu

AU - Kadetotad, Deepak

AU - Seo, Jae-sun

PY - 2017

Y1 - 2017

N2 - Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementations of these deep CNN architectures are challenged with memory bottlenecks that require many convolution and fully-connected layers demanding large amount of communication for parallel computation. Multi-core CPU based solutions have demonstrated their inadequacy for this problem due to the memory wall and low parallelism. Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory. OpenCL is commonly used to describe these architectures for their execution on GPGPUs or FPGAs. FPGA design solutions are also actively being explored, which allow implementing the memory hierarchy using embedded parallel BlockRAMs. This boosts the parallel use of shared memory elements between multiple processing units, avoiding data replicability and inconsistencies. This makes FPGAs potentially powerful solutions for real-time classification of CNNs. In this paper both Altera and Xilinx adopted OpenCL co-design frameworks for pseudo-automatic development solutions are evaluated. A comprehensive evaluation and comparison for a 5-layer deep CNN is presented. Hardware resources, temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx demonstrates faster synthesis, better FPGA resource utilization and more compact boards. Altera provides multi-platforms tools, mature design community and better execution times.

AB - Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementations of these deep CNN architectures are challenged with memory bottlenecks that require many convolution and fully-connected layers demanding large amount of communication for parallel computation. Multi-core CPU based solutions have demonstrated their inadequacy for this problem due to the memory wall and low parallelism. Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory. OpenCL is commonly used to describe these architectures for their execution on GPGPUs or FPGAs. FPGA design solutions are also actively being explored, which allow implementing the memory hierarchy using embedded parallel BlockRAMs. This boosts the parallel use of shared memory elements between multiple processing units, avoiding data replicability and inconsistencies. This makes FPGAs potentially powerful solutions for real-time classification of CNNs. In this paper both Altera and Xilinx adopted OpenCL co-design frameworks for pseudo-automatic development solutions are evaluated. A comprehensive evaluation and comparison for a 5-layer deep CNN is presented. Hardware resources, temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx demonstrates faster synthesis, better FPGA resource utilization and more compact boards. Altera provides multi-platforms tools, mature design community and better execution times.

KW - Altera

KW - Caffe

KW - Convolutional Neural Network

KW - Deep learning

KW - FPGA

KW - Hardware acceleration

KW - OpenCL

KW - Xilinx

UR - http://www.scopus.com/inward/record.url?scp=85020881750&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85020881750&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-59147-6_24

DO - 10.1007/978-3-319-59147-6_24

M3 - Conference contribution

SN - 9783319591469

VL - 10306 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 271

EP - 282

BT - Advances in Computational Intelligence - 14th International Work-Conference on Artificial Neural Networks, IWANN 2017, Proceedings

PB - Springer Verlag

ER -