MAX2

ReRAM-Based Neural Network Accelerator That Maximizes Data Reuse and Area Utilization

Manqing Mao, Xiaochen Peng, Rui Liu, Jingtao Li, Shimeng Yu, Chaitali Chakrabarti

Research output: Contribution to journalArticle

Abstract

Although recent advances in resistive random access memory (ReRAM)-based accelerator designs for deep convolutional neural networks (CNNs) offer energy-efficiency improvements over CMOS-based accelerators, they have a large number of energy consuming data transactions. In this paper, we propose MAX2, a multi-tile ReRAM accelerator framework for supporting multiple CNN topologies, that maximizes on-chip data reuse and reduces on-chip bandwidth to minimize energy consumption due to data movement. Building upon the fact that a large filter can be built with a stack of smaller ( 3\times 3 ) filters, we design every tile with nine processing elements (PEs). Each PE consists of multiple ReRAM subarrays to compute the dot product. The PEs operate in a systolic fashion, thereby maximizing input feature map reuse and minimizing interconnection cost. MAX chooses the data size granularity in the systolic array in conjunction with weight duplication to achieve very high area utilization without requiring additional peripheral circuits. We provide a detailed energy and area breakdown of each component at the PE level, tile level, and system level. The system-level evaluation in 32-nm node on several VGG-network benchmarks shows that the MAX can improve computation efficiency (TOPs/s/mm) by 2.5\times and energy efficiency (TOPs/s/W) by 5.2\times compared with a state-of-the-art ReRAM-based accelerator.

Original languageEnglish (US)
Article number8680623
Pages (from-to)398-410
Number of pages13
JournalIEEE Journal on Emerging and Selected Topics in Circuits and Systems
Volume9
Issue number2
DOIs
StatePublished - Jun 1 2019

Fingerprint

Particle accelerators
Tile
Neural networks
Data storage equipment
Processing
Energy efficiency
Systolic arrays
Energy utilization
Topology
Bandwidth
Networks (circuits)
Costs

Keywords

  • accelerator
  • CNN
  • data reuse
  • ReRAM
  • systolic

ASJC Scopus subject areas

  • Electrical and Electronic Engineering

Cite this

MAX2 : ReRAM-Based Neural Network Accelerator That Maximizes Data Reuse and Area Utilization. / Mao, Manqing; Peng, Xiaochen; Liu, Rui; Li, Jingtao; Yu, Shimeng; Chakrabarti, Chaitali.

In: IEEE Journal on Emerging and Selected Topics in Circuits and Systems, Vol. 9, No. 2, 8680623, 01.06.2019, p. 398-410.

Research output: Contribution to journalArticle

@article{6d97d59cf7244ad8ab22001b82f56430,
title = "MAX2: ReRAM-Based Neural Network Accelerator That Maximizes Data Reuse and Area Utilization",
abstract = "Although recent advances in resistive random access memory (ReRAM)-based accelerator designs for deep convolutional neural networks (CNNs) offer energy-efficiency improvements over CMOS-based accelerators, they have a large number of energy consuming data transactions. In this paper, we propose MAX2, a multi-tile ReRAM accelerator framework for supporting multiple CNN topologies, that maximizes on-chip data reuse and reduces on-chip bandwidth to minimize energy consumption due to data movement. Building upon the fact that a large filter can be built with a stack of smaller ( 3\times 3 ) filters, we design every tile with nine processing elements (PEs). Each PE consists of multiple ReRAM subarrays to compute the dot product. The PEs operate in a systolic fashion, thereby maximizing input feature map reuse and minimizing interconnection cost. MAX chooses the data size granularity in the systolic array in conjunction with weight duplication to achieve very high area utilization without requiring additional peripheral circuits. We provide a detailed energy and area breakdown of each component at the PE level, tile level, and system level. The system-level evaluation in 32-nm node on several VGG-network benchmarks shows that the MAX can improve computation efficiency (TOPs/s/mm) by 2.5\times and energy efficiency (TOPs/s/W) by 5.2\times compared with a state-of-the-art ReRAM-based accelerator.",
keywords = "accelerator, CNN, data reuse, ReRAM, systolic",
author = "Manqing Mao and Xiaochen Peng and Rui Liu and Jingtao Li and Shimeng Yu and Chaitali Chakrabarti",
year = "2019",
month = "6",
day = "1",
doi = "10.1109/JETCAS.2019.2908937",
language = "English (US)",
volume = "9",
pages = "398--410",
journal = "IEEE Journal on Emerging and Selected Topics in Circuits and Systems",
issn = "2156-3357",
publisher = "IEEE Circuits and Systems Society",
number = "2",

}

TY - JOUR

T1 - MAX2

T2 - ReRAM-Based Neural Network Accelerator That Maximizes Data Reuse and Area Utilization

AU - Mao, Manqing

AU - Peng, Xiaochen

AU - Liu, Rui

AU - Li, Jingtao

AU - Yu, Shimeng

AU - Chakrabarti, Chaitali

PY - 2019/6/1

Y1 - 2019/6/1

N2 - Although recent advances in resistive random access memory (ReRAM)-based accelerator designs for deep convolutional neural networks (CNNs) offer energy-efficiency improvements over CMOS-based accelerators, they have a large number of energy consuming data transactions. In this paper, we propose MAX2, a multi-tile ReRAM accelerator framework for supporting multiple CNN topologies, that maximizes on-chip data reuse and reduces on-chip bandwidth to minimize energy consumption due to data movement. Building upon the fact that a large filter can be built with a stack of smaller ( 3\times 3 ) filters, we design every tile with nine processing elements (PEs). Each PE consists of multiple ReRAM subarrays to compute the dot product. The PEs operate in a systolic fashion, thereby maximizing input feature map reuse and minimizing interconnection cost. MAX chooses the data size granularity in the systolic array in conjunction with weight duplication to achieve very high area utilization without requiring additional peripheral circuits. We provide a detailed energy and area breakdown of each component at the PE level, tile level, and system level. The system-level evaluation in 32-nm node on several VGG-network benchmarks shows that the MAX can improve computation efficiency (TOPs/s/mm) by 2.5\times and energy efficiency (TOPs/s/W) by 5.2\times compared with a state-of-the-art ReRAM-based accelerator.

AB - Although recent advances in resistive random access memory (ReRAM)-based accelerator designs for deep convolutional neural networks (CNNs) offer energy-efficiency improvements over CMOS-based accelerators, they have a large number of energy consuming data transactions. In this paper, we propose MAX2, a multi-tile ReRAM accelerator framework for supporting multiple CNN topologies, that maximizes on-chip data reuse and reduces on-chip bandwidth to minimize energy consumption due to data movement. Building upon the fact that a large filter can be built with a stack of smaller ( 3\times 3 ) filters, we design every tile with nine processing elements (PEs). Each PE consists of multiple ReRAM subarrays to compute the dot product. The PEs operate in a systolic fashion, thereby maximizing input feature map reuse and minimizing interconnection cost. MAX chooses the data size granularity in the systolic array in conjunction with weight duplication to achieve very high area utilization without requiring additional peripheral circuits. We provide a detailed energy and area breakdown of each component at the PE level, tile level, and system level. The system-level evaluation in 32-nm node on several VGG-network benchmarks shows that the MAX can improve computation efficiency (TOPs/s/mm) by 2.5\times and energy efficiency (TOPs/s/W) by 5.2\times compared with a state-of-the-art ReRAM-based accelerator.

KW - accelerator

KW - CNN

KW - data reuse

KW - ReRAM

KW - systolic

UR - http://www.scopus.com/inward/record.url?scp=85067344928&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85067344928&partnerID=8YFLogxK

U2 - 10.1109/JETCAS.2019.2908937

DO - 10.1109/JETCAS.2019.2908937

M3 - Article

VL - 9

SP - 398

EP - 410

JO - IEEE Journal on Emerging and Selected Topics in Circuits and Systems

JF - IEEE Journal on Emerging and Selected Topics in Circuits and Systems

SN - 2156-3357

IS - 2

M1 - 8680623

ER -