Abstract

Though recent progress in resistive random access memory (ReRAM)-based accelerator designs for convolutional neural networks (CNN) achieve superior timing performance and area-efficiency improvements over CMOS-based accelerators, they have high energy consumptions due to low inter-layer data reuse. In this work, we propose a multi-tile ReRAM accelerator for supporting multiple CNN topologies, where each tile processes one or more layers in a pipelined fashion. Building upon the fact that a tile with large receptive field can be built with a stack of smaller (3×3) filters, we design every tile with 9 processing elements that operate in a systolic fashion. Use of systolic data flow design maximizes input feature map reuse and minimizes interconnection cost. We show that 1-bit weight and 4-bit activation achieves good accuracy for both AlexNet and VGGNet, and design our ReRAM based accelerator to support this configuration. System-level simulation results on 32 nm node show that the proposed architecture for AlexNet with stacking small filters can achieve computation efficiency of 8.42 TOPs/s/mm 2 , energy efficiency of 4.08 TOPs/s/W and storage efficiency of 0.18 MB/mm 2 for inference computation of one image in the CIFAR-100 dataset.

Original languageEnglish (US)
Title of host publicationProceedings of the IEEE Workshop on Signal Processing Systems, SiPS 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages211-216
Number of pages6
ISBN (Electronic)9781538663189
DOIs
StatePublished - Dec 31 2018
Event2018 IEEE Workshop on Signal Processing Systems, SiPS 2018 - Cape Town, South Africa
Duration: Oct 21 2018Oct 24 2018

Publication series

NameIEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation
Volume2018-October
ISSN (Print)1520-6130

Conference

Conference2018 IEEE Workshop on Signal Processing Systems, SiPS 2018
CountrySouth Africa
CityCape Town
Period10/21/1810/24/18

Fingerprint

Random Access
Tile
Accelerator
Particle accelerators
Neural Networks
Neural networks
Data storage equipment
Data Reuse
Receptive Field
Filter Design
Stacking
Data Flow
Energy Efficiency
Interconnection
Network Topology
Reuse
Energy Consumption
Energy efficiency
High Energy
Activation

Keywords

  • accelerator
  • CNN
  • ReRAM
  • systolic

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Signal Processing
  • Applied Mathematics
  • Hardware and Architecture

Cite this

Mao, M., Sun, X., Peng, X., Yu, S., & Chakrabarti, C. (2018). A Versatile ReRAM-based Accelerator for Convolutional Neural Networks. In Proceedings of the IEEE Workshop on Signal Processing Systems, SiPS 2018 (pp. 211-216). [8598372] (IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation; Vol. 2018-October). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/SiPS.2018.8598372

A Versatile ReRAM-based Accelerator for Convolutional Neural Networks. / Mao, Manqing; Sun, Xiaoyu; Peng, Xiaochen; Yu, Shimeng; Chakrabarti, Chaitali.

Proceedings of the IEEE Workshop on Signal Processing Systems, SiPS 2018. Institute of Electrical and Electronics Engineers Inc., 2018. p. 211-216 8598372 (IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation; Vol. 2018-October).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Mao, M, Sun, X, Peng, X, Yu, S & Chakrabarti, C 2018, A Versatile ReRAM-based Accelerator for Convolutional Neural Networks. in Proceedings of the IEEE Workshop on Signal Processing Systems, SiPS 2018., 8598372, IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation, vol. 2018-October, Institute of Electrical and Electronics Engineers Inc., pp. 211-216, 2018 IEEE Workshop on Signal Processing Systems, SiPS 2018, Cape Town, South Africa, 10/21/18. https://doi.org/10.1109/SiPS.2018.8598372
Mao M, Sun X, Peng X, Yu S, Chakrabarti C. A Versatile ReRAM-based Accelerator for Convolutional Neural Networks. In Proceedings of the IEEE Workshop on Signal Processing Systems, SiPS 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 211-216. 8598372. (IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation). https://doi.org/10.1109/SiPS.2018.8598372
Mao, Manqing ; Sun, Xiaoyu ; Peng, Xiaochen ; Yu, Shimeng ; Chakrabarti, Chaitali. / A Versatile ReRAM-based Accelerator for Convolutional Neural Networks. Proceedings of the IEEE Workshop on Signal Processing Systems, SiPS 2018. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 211-216 (IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation).
@inproceedings{24894419496d4a5d8827a244304e11d6,
title = "A Versatile ReRAM-based Accelerator for Convolutional Neural Networks",
abstract = "Though recent progress in resistive random access memory (ReRAM)-based accelerator designs for convolutional neural networks (CNN) achieve superior timing performance and area-efficiency improvements over CMOS-based accelerators, they have high energy consumptions due to low inter-layer data reuse. In this work, we propose a multi-tile ReRAM accelerator for supporting multiple CNN topologies, where each tile processes one or more layers in a pipelined fashion. Building upon the fact that a tile with large receptive field can be built with a stack of smaller (3×3) filters, we design every tile with 9 processing elements that operate in a systolic fashion. Use of systolic data flow design maximizes input feature map reuse and minimizes interconnection cost. We show that 1-bit weight and 4-bit activation achieves good accuracy for both AlexNet and VGGNet, and design our ReRAM based accelerator to support this configuration. System-level simulation results on 32 nm node show that the proposed architecture for AlexNet with stacking small filters can achieve computation efficiency of 8.42 TOPs/s/mm 2 , energy efficiency of 4.08 TOPs/s/W and storage efficiency of 0.18 MB/mm 2 for inference computation of one image in the CIFAR-100 dataset.",
keywords = "accelerator, CNN, ReRAM, systolic",
author = "Manqing Mao and Xiaoyu Sun and Xiaochen Peng and Shimeng Yu and Chaitali Chakrabarti",
year = "2018",
month = "12",
day = "31",
doi = "10.1109/SiPS.2018.8598372",
language = "English (US)",
series = "IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "211--216",
booktitle = "Proceedings of the IEEE Workshop on Signal Processing Systems, SiPS 2018",

}

TY - GEN

T1 - A Versatile ReRAM-based Accelerator for Convolutional Neural Networks

AU - Mao, Manqing

AU - Sun, Xiaoyu

AU - Peng, Xiaochen

AU - Yu, Shimeng

AU - Chakrabarti, Chaitali

PY - 2018/12/31

Y1 - 2018/12/31

N2 - Though recent progress in resistive random access memory (ReRAM)-based accelerator designs for convolutional neural networks (CNN) achieve superior timing performance and area-efficiency improvements over CMOS-based accelerators, they have high energy consumptions due to low inter-layer data reuse. In this work, we propose a multi-tile ReRAM accelerator for supporting multiple CNN topologies, where each tile processes one or more layers in a pipelined fashion. Building upon the fact that a tile with large receptive field can be built with a stack of smaller (3×3) filters, we design every tile with 9 processing elements that operate in a systolic fashion. Use of systolic data flow design maximizes input feature map reuse and minimizes interconnection cost. We show that 1-bit weight and 4-bit activation achieves good accuracy for both AlexNet and VGGNet, and design our ReRAM based accelerator to support this configuration. System-level simulation results on 32 nm node show that the proposed architecture for AlexNet with stacking small filters can achieve computation efficiency of 8.42 TOPs/s/mm 2 , energy efficiency of 4.08 TOPs/s/W and storage efficiency of 0.18 MB/mm 2 for inference computation of one image in the CIFAR-100 dataset.

AB - Though recent progress in resistive random access memory (ReRAM)-based accelerator designs for convolutional neural networks (CNN) achieve superior timing performance and area-efficiency improvements over CMOS-based accelerators, they have high energy consumptions due to low inter-layer data reuse. In this work, we propose a multi-tile ReRAM accelerator for supporting multiple CNN topologies, where each tile processes one or more layers in a pipelined fashion. Building upon the fact that a tile with large receptive field can be built with a stack of smaller (3×3) filters, we design every tile with 9 processing elements that operate in a systolic fashion. Use of systolic data flow design maximizes input feature map reuse and minimizes interconnection cost. We show that 1-bit weight and 4-bit activation achieves good accuracy for both AlexNet and VGGNet, and design our ReRAM based accelerator to support this configuration. System-level simulation results on 32 nm node show that the proposed architecture for AlexNet with stacking small filters can achieve computation efficiency of 8.42 TOPs/s/mm 2 , energy efficiency of 4.08 TOPs/s/W and storage efficiency of 0.18 MB/mm 2 for inference computation of one image in the CIFAR-100 dataset.

KW - accelerator

KW - CNN

KW - ReRAM

KW - systolic

UR - http://www.scopus.com/inward/record.url?scp=85061367988&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061367988&partnerID=8YFLogxK

U2 - 10.1109/SiPS.2018.8598372

DO - 10.1109/SiPS.2018.8598372

M3 - Conference contribution

T3 - IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation

SP - 211

EP - 216

BT - Proceedings of the IEEE Workshop on Signal Processing Systems, SiPS 2018

PB - Institute of Electrical and Electronics Engineers Inc.

ER -