Optimizing weight mapping and data flow for convolutional neural networks on rram based processing-in-memory architecture

Xiaochen Peng; Rui Liu; Shimeng Yu

doi:10.1109/ISCAS.2019.8702715

Optimizing weight mapping and data flow for convolutional neural networks on rram based processing-in-memory architecture

Xiaochen Peng, Rui Liu, Shimeng Yu

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

61 Scopus citations

Abstract

Resistive random access memory (RRAM) based array architecture has been proposed for on-chip acceleration of convolutional neural networks (CNNs), where the array could be configured for dot-product computation in a parallel fashion by summing up the column currents. Prior processing-in-memory (PIM) designs unroll each 3D kernel of the convolutional layers into a vertical column of a large weight matrix, where the input data will be accessed multiple times. As a result, significant latency and energy are consumed in interconnect and buffer. In this paper, in order to maximize both weight and input data reuse for RRAM based PIM architecture, we propose a novel weight mapping method and the corresponding data flow which divides the kernels and assign the input data into different processing-elements (PEs) according to their spatial locations. The proposed design achieves ~65% save in latency and energy for interconnect and buffer, and yields overall 2.1× speed up and ~17% improvement in the energy efficiency in terms of TOPS/W for VGG-16 CNN, compared with the prior design based on the conventional mapping method.

Original language	English (US)
Title of host publication	2019 IEEE International Symposium on Circuits and Systems, ISCAS 2019 - Proceedings
Publisher	Institute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)	9781728103976
DOIs	https://doi.org/10.1109/ISCAS.2019.8702715
State	Published - 2019
Event	2019 IEEE International Symposium on Circuits and Systems, ISCAS 2019 - Sapporo, Japan Duration: May 26 2019 → May 29 2019

Publication series

Name	Proceedings - IEEE International Symposium on Circuits and Systems
Volume	2019-May
ISSN (Print)	0271-4310

Conference

Conference	2019 IEEE International Symposium on Circuits and Systems, ISCAS 2019
Country/Territory	Japan
City	Sapporo
Period	5/26/19 → 5/29/19

Keywords

Deep neural network
Hardware accelerator
Machine learning
Non-volatile memory
Processing-in-memory

ASJC Scopus subject areas

Electrical and Electronic Engineering

Access to Document

10.1109/ISCAS.2019.8702715

Cite this

Peng, X., Liu, R., & Yu, S. (2019). Optimizing weight mapping and data flow for convolutional neural networks on rram based processing-in-memory architecture. In 2019 IEEE International Symposium on Circuits and Systems, ISCAS 2019 - Proceedings Article 8702715 (Proceedings - IEEE International Symposium on Circuits and Systems; Vol. 2019-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ISCAS.2019.8702715

Optimizing weight mapping and data flow for convolutional neural networks on rram based processing-in-memory architecture. / Peng, Xiaochen; Liu, Rui; Yu, Shimeng.
2019 IEEE International Symposium on Circuits and Systems, ISCAS 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. 8702715 (Proceedings - IEEE International Symposium on Circuits and Systems; Vol. 2019-May).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Peng, X, Liu, R & Yu, S 2019, Optimizing weight mapping and data flow for convolutional neural networks on rram based processing-in-memory architecture. in 2019 IEEE International Symposium on Circuits and Systems, ISCAS 2019 - Proceedings., 8702715, Proceedings - IEEE International Symposium on Circuits and Systems, vol. 2019-May, Institute of Electrical and Electronics Engineers Inc., 2019 IEEE International Symposium on Circuits and Systems, ISCAS 2019, Sapporo, Japan, 5/26/19. https://doi.org/10.1109/ISCAS.2019.8702715

Peng X, Liu R, Yu S. Optimizing weight mapping and data flow for convolutional neural networks on rram based processing-in-memory architecture. In 2019 IEEE International Symposium on Circuits and Systems, ISCAS 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2019. 8702715. (Proceedings - IEEE International Symposium on Circuits and Systems). doi: 10.1109/ISCAS.2019.8702715

Peng, Xiaochen ; Liu, Rui ; Yu, Shimeng. / Optimizing weight mapping and data flow for convolutional neural networks on rram based processing-in-memory architecture. 2019 IEEE International Symposium on Circuits and Systems, ISCAS 2019 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2019. (Proceedings - IEEE International Symposium on Circuits and Systems).

@inproceedings{3cddf8925c2a4656b4048e97a85afd3b,

title = "Optimizing weight mapping and data flow for convolutional neural networks on rram based processing-in-memory architecture",

abstract = " Resistive random access memory (RRAM) based array architecture has been proposed for on-chip acceleration of convolutional neural networks (CNNs), where the array could be configured for dot-product computation in a parallel fashion by summing up the column currents. Prior processing-in-memory (PIM) designs unroll each 3D kernel of the convolutional layers into a vertical column of a large weight matrix, where the input data will be accessed multiple times. As a result, significant latency and energy are consumed in interconnect and buffer. In this paper, in order to maximize both weight and input data reuse for RRAM based PIM architecture, we propose a novel weight mapping method and the corresponding data flow which divides the kernels and assign the input data into different processing-elements (PEs) according to their spatial locations. The proposed design achieves ~65% save in latency and energy for interconnect and buffer, and yields overall 2.1× speed up and ~17% improvement in the energy efficiency in terms of TOPS/W for VGG-16 CNN, compared with the prior design based on the conventional mapping method.",

keywords = "Deep neural network, Hardware accelerator, Machine learning, Non-volatile memory, Processing-in-memory",

author = "Xiaochen Peng and Rui Liu and Shimeng Yu",

note = "Funding Information: This work is supported by ASCENT, one of the SRC/DARPA JUMP centers, NSF-CCF-1903951, NSF-CCF-1740225, SRC Contract 2018-NC-2762 and Samsung. Publisher Copyright: {\textcopyright} 2019 IEEE; 2019 IEEE International Symposium on Circuits and Systems, ISCAS 2019 ; Conference date: 26-05-2019 Through 29-05-2019",

year = "2019",

doi = "10.1109/ISCAS.2019.8702715",

language = "English (US)",

series = "Proceedings - IEEE International Symposium on Circuits and Systems",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

booktitle = "2019 IEEE International Symposium on Circuits and Systems, ISCAS 2019 - Proceedings",

}

TY - GEN

T1 - Optimizing weight mapping and data flow for convolutional neural networks on rram based processing-in-memory architecture

AU - Peng, Xiaochen

AU - Liu, Rui

AU - Yu, Shimeng

PY - 2019

Y1 - 2019

N2 - Resistive random access memory (RRAM) based array architecture has been proposed for on-chip acceleration of convolutional neural networks (CNNs), where the array could be configured for dot-product computation in a parallel fashion by summing up the column currents. Prior processing-in-memory (PIM) designs unroll each 3D kernel of the convolutional layers into a vertical column of a large weight matrix, where the input data will be accessed multiple times. As a result, significant latency and energy are consumed in interconnect and buffer. In this paper, in order to maximize both weight and input data reuse for RRAM based PIM architecture, we propose a novel weight mapping method and the corresponding data flow which divides the kernels and assign the input data into different processing-elements (PEs) according to their spatial locations. The proposed design achieves ~65% save in latency and energy for interconnect and buffer, and yields overall 2.1× speed up and ~17% improvement in the energy efficiency in terms of TOPS/W for VGG-16 CNN, compared with the prior design based on the conventional mapping method.

AB - Resistive random access memory (RRAM) based array architecture has been proposed for on-chip acceleration of convolutional neural networks (CNNs), where the array could be configured for dot-product computation in a parallel fashion by summing up the column currents. Prior processing-in-memory (PIM) designs unroll each 3D kernel of the convolutional layers into a vertical column of a large weight matrix, where the input data will be accessed multiple times. As a result, significant latency and energy are consumed in interconnect and buffer. In this paper, in order to maximize both weight and input data reuse for RRAM based PIM architecture, we propose a novel weight mapping method and the corresponding data flow which divides the kernels and assign the input data into different processing-elements (PEs) according to their spatial locations. The proposed design achieves ~65% save in latency and energy for interconnect and buffer, and yields overall 2.1× speed up and ~17% improvement in the energy efficiency in terms of TOPS/W for VGG-16 CNN, compared with the prior design based on the conventional mapping method.

KW - Deep neural network

KW - Hardware accelerator

KW - Machine learning

KW - Non-volatile memory

KW - Processing-in-memory

UR - http://www.scopus.com/inward/record.url?scp=85066804151&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85066804151&partnerID=8YFLogxK

U2 - 10.1109/ISCAS.2019.8702715

DO - 10.1109/ISCAS.2019.8702715

M3 - Conference contribution

AN - SCOPUS:85066804151

T3 - Proceedings - IEEE International Symposium on Circuits and Systems

BT - 2019 IEEE International Symposium on Circuits and Systems, ISCAS 2019 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2019 IEEE International Symposium on Circuits and Systems, ISCAS 2019

Y2 - 26 May 2019 through 29 May 2019

ER -

Optimizing weight mapping and data flow for convolutional neural networks on rram based processing-in-memory architecture

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this