TY - GEN
T1 - A Versatile ReRAM-based Accelerator for Convolutional Neural Networks
AU - Mao, Manqing
AU - Sun, Xiaoyu
AU - Peng, Xiaochen
AU - Yu, Shimeng
AU - Chakrabarti, Chaitali
N1 - Publisher Copyright:
© 2018 IEEE.
Copyright:
Copyright 2019 Elsevier B.V., All rights reserved.
PY - 2018/12/31
Y1 - 2018/12/31
N2 - Though recent progress in resistive random access memory (ReRAM)-based accelerator designs for convolutional neural networks (CNN) achieve superior timing performance and area-efficiency improvements over CMOS-based accelerators, they have high energy consumptions due to low inter-layer data reuse. In this work, we propose a multi-tile ReRAM accelerator for supporting multiple CNN topologies, where each tile processes one or more layers in a pipelined fashion. Building upon the fact that a tile with large receptive field can be built with a stack of smaller (3×3) filters, we design every tile with 9 processing elements that operate in a systolic fashion. Use of systolic data flow design maximizes input feature map reuse and minimizes interconnection cost. We show that 1-bit weight and 4-bit activation achieves good accuracy for both AlexNet and VGGNet, and design our ReRAM based accelerator to support this configuration. System-level simulation results on 32 nm node show that the proposed architecture for AlexNet with stacking small filters can achieve computation efficiency of 8.42 TOPs/s/mm 2 , energy efficiency of 4.08 TOPs/s/W and storage efficiency of 0.18 MB/mm 2 for inference computation of one image in the CIFAR-100 dataset.
AB - Though recent progress in resistive random access memory (ReRAM)-based accelerator designs for convolutional neural networks (CNN) achieve superior timing performance and area-efficiency improvements over CMOS-based accelerators, they have high energy consumptions due to low inter-layer data reuse. In this work, we propose a multi-tile ReRAM accelerator for supporting multiple CNN topologies, where each tile processes one or more layers in a pipelined fashion. Building upon the fact that a tile with large receptive field can be built with a stack of smaller (3×3) filters, we design every tile with 9 processing elements that operate in a systolic fashion. Use of systolic data flow design maximizes input feature map reuse and minimizes interconnection cost. We show that 1-bit weight and 4-bit activation achieves good accuracy for both AlexNet and VGGNet, and design our ReRAM based accelerator to support this configuration. System-level simulation results on 32 nm node show that the proposed architecture for AlexNet with stacking small filters can achieve computation efficiency of 8.42 TOPs/s/mm 2 , energy efficiency of 4.08 TOPs/s/W and storage efficiency of 0.18 MB/mm 2 for inference computation of one image in the CIFAR-100 dataset.
KW - CNN
KW - ReRAM
KW - accelerator
KW - systolic
UR - http://www.scopus.com/inward/record.url?scp=85061367988&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85061367988&partnerID=8YFLogxK
U2 - 10.1109/SiPS.2018.8598372
DO - 10.1109/SiPS.2018.8598372
M3 - Conference contribution
AN - SCOPUS:85061367988
T3 - IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation
SP - 211
EP - 216
BT - Proceedings of the IEEE Workshop on Signal Processing Systems, SiPS 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE Workshop on Signal Processing Systems, SiPS 2018
Y2 - 21 October 2018 through 24 October 2018
ER -