Accelerating low bit-width deep convolution neural network in MRAM

Zhezhi He; Shaahin Angizi; Deliang Fan

doi:10.1109/ISVLSI.2018.00103

Accelerating low bit-width deep convolution neural network in MRAM

Zhezhi He, Shaahin Angizi, Deliang Fan

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

14 Scopus citations

Abstract

Deep Convolution Neural Network (CNN) has achieved outstanding performance in image recognition over large scale dataset. However, pursuit of higher inference accuracy leads to CNN architecture with deeper layers and denser connections, which inevitably makes its hardware implementation demand more and more memory and computational resources. It can be interpreted as 'CNN power and memory wall'. Recent research efforts have significantly reduced both model size and computational complexity by using low bit-width weights, activations and gradients, while keeping reasonably good accuracy. In this work, we present different emerging nonvolatile Magnetic Random Access Memory (MRAM) designs that could be leveraged to implement 'bit-wise in-memory convolution engine', which could simultaneously store network parameters and compute low bit-width convolution. Such new computing model leverages the 'in-memory computing' concept to accelerate CNN inference and reduce convolution energy consumption due to intrinsic logic-in-memory design and reduction of data communication.

Original language	English (US)
Title of host publication	Proceedings - 2018 IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2018
Publisher	IEEE Computer Society
Pages	533-538
Number of pages	6
ISBN (Print)	9781538670996
DOIs	https://doi.org/10.1109/ISVLSI.2018.00103
State	Published - Aug 7 2018
Externally published	Yes
Event	17th IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2018 - Hong Kong, Hong Kong Duration: Jul 9 2018 → Jul 11 2018

Publication series

Name	Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI
Volume	2018-July
ISSN (Print)	2159-3469
ISSN (Electronic)	2159-3477

Conference

Conference	17th IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2018
Country/Territory	Hong Kong
City	Hong Kong
Period	7/9/18 → 7/11/18

Keywords

In-memory computing
Magnetic Random Access Memory
Neural network acceleration

ASJC Scopus subject areas

Hardware and Architecture
Control and Systems Engineering
Electrical and Electronic Engineering

Access to Document

10.1109/ISVLSI.2018.00103

Cite this

He, Z., Angizi, S., & Fan, D. (2018). Accelerating low bit-width deep convolution neural network in MRAM. In Proceedings - 2018 IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2018 (pp. 533-538). Article 8429424 (Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI; Vol. 2018-July). IEEE Computer Society. https://doi.org/10.1109/ISVLSI.2018.00103

Accelerating low bit-width deep convolution neural network in MRAM. / He, Zhezhi; Angizi, Shaahin; Fan, Deliang.
Proceedings - 2018 IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2018. IEEE Computer Society, 2018. p. 533-538 8429424 (Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI; Vol. 2018-July).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

He, Z, Angizi, S & Fan, D 2018, Accelerating low bit-width deep convolution neural network in MRAM. in Proceedings - 2018 IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2018., 8429424, Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI, vol. 2018-July, IEEE Computer Society, pp. 533-538, 17th IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2018, Hong Kong, Hong Kong, 7/9/18. https://doi.org/10.1109/ISVLSI.2018.00103

@inproceedings{87e55ef6f9b545969c75f8de98f51e09,

title = "Accelerating low bit-width deep convolution neural network in MRAM",

abstract = "Deep Convolution Neural Network (CNN) has achieved outstanding performance in image recognition over large scale dataset. However, pursuit of higher inference accuracy leads to CNN architecture with deeper layers and denser connections, which inevitably makes its hardware implementation demand more and more memory and computational resources. It can be interpreted as 'CNN power and memory wall'. Recent research efforts have significantly reduced both model size and computational complexity by using low bit-width weights, activations and gradients, while keeping reasonably good accuracy. In this work, we present different emerging nonvolatile Magnetic Random Access Memory (MRAM) designs that could be leveraged to implement 'bit-wise in-memory convolution engine', which could simultaneously store network parameters and compute low bit-width convolution. Such new computing model leverages the 'in-memory computing' concept to accelerate CNN inference and reduce convolution energy consumption due to intrinsic logic-in-memory design and reduction of data communication.",

keywords = "In-memory computing, Magnetic Random Access Memory, Neural network acceleration",

author = "Zhezhi He and Shaahin Angizi and Deliang Fan",

note = "Funding Information: IV. SUMMARY In this work, we explicitly discuss the method to leverage in-memory XNOR computation using two different computational MRAM designs to accelerate the state-of-the-art binarized deep neural network, spanning across algorithm, architecture, circuit and device. Great energy efficiency is achieved due to its intrinsic in-memory logic designs and processing-in-memory architecture to reduce off-chip memory access. Acknowledgement: This work is supported in part by the National Science Foundation under Grant No. 1740126 and Semiconductor Research Corporation nCORE. Publisher Copyright: {\textcopyright} 2018 IEEE.; 17th IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2018 ; Conference date: 09-07-2018 Through 11-07-2018",

year = "2018",

month = aug,

day = "7",

doi = "10.1109/ISVLSI.2018.00103",

language = "English (US)",

isbn = "9781538670996",

series = "Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI",

publisher = "IEEE Computer Society",

pages = "533--538",

booktitle = "Proceedings - 2018 IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2018",

}

TY - GEN

T1 - Accelerating low bit-width deep convolution neural network in MRAM

AU - He, Zhezhi

AU - Angizi, Shaahin

AU - Fan, Deliang

N1 - Funding Information: IV. SUMMARY In this work, we explicitly discuss the method to leverage in-memory XNOR computation using two different computational MRAM designs to accelerate the state-of-the-art binarized deep neural network, spanning across algorithm, architecture, circuit and device. Great energy efficiency is achieved due to its intrinsic in-memory logic designs and processing-in-memory architecture to reduce off-chip memory access. Acknowledgement: This work is supported in part by the National Science Foundation under Grant No. 1740126 and Semiconductor Research Corporation nCORE. Publisher Copyright: © 2018 IEEE.

PY - 2018/8/7

Y1 - 2018/8/7

N2 - Deep Convolution Neural Network (CNN) has achieved outstanding performance in image recognition over large scale dataset. However, pursuit of higher inference accuracy leads to CNN architecture with deeper layers and denser connections, which inevitably makes its hardware implementation demand more and more memory and computational resources. It can be interpreted as 'CNN power and memory wall'. Recent research efforts have significantly reduced both model size and computational complexity by using low bit-width weights, activations and gradients, while keeping reasonably good accuracy. In this work, we present different emerging nonvolatile Magnetic Random Access Memory (MRAM) designs that could be leveraged to implement 'bit-wise in-memory convolution engine', which could simultaneously store network parameters and compute low bit-width convolution. Such new computing model leverages the 'in-memory computing' concept to accelerate CNN inference and reduce convolution energy consumption due to intrinsic logic-in-memory design and reduction of data communication.

AB - Deep Convolution Neural Network (CNN) has achieved outstanding performance in image recognition over large scale dataset. However, pursuit of higher inference accuracy leads to CNN architecture with deeper layers and denser connections, which inevitably makes its hardware implementation demand more and more memory and computational resources. It can be interpreted as 'CNN power and memory wall'. Recent research efforts have significantly reduced both model size and computational complexity by using low bit-width weights, activations and gradients, while keeping reasonably good accuracy. In this work, we present different emerging nonvolatile Magnetic Random Access Memory (MRAM) designs that could be leveraged to implement 'bit-wise in-memory convolution engine', which could simultaneously store network parameters and compute low bit-width convolution. Such new computing model leverages the 'in-memory computing' concept to accelerate CNN inference and reduce convolution energy consumption due to intrinsic logic-in-memory design and reduction of data communication.

KW - In-memory computing

KW - Magnetic Random Access Memory

KW - Neural network acceleration

UR - http://www.scopus.com/inward/record.url?scp=85052125131&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85052125131&partnerID=8YFLogxK

U2 - 10.1109/ISVLSI.2018.00103

DO - 10.1109/ISVLSI.2018.00103

M3 - Conference contribution

AN - SCOPUS:85052125131

SN - 9781538670996

T3 - Proceedings of IEEE Computer Society Annual Symposium on VLSI, ISVLSI

SP - 533

EP - 538

BT - Proceedings - 2018 IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2018

PB - IEEE Computer Society

T2 - 17th IEEE Computer Society Annual Symposium on VLSI, ISVLSI 2018

Y2 - 9 July 2018 through 11 July 2018

ER -

Accelerating low bit-width deep convolution neural network in MRAM

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this