Structured Pruning of RRAM Crossbars for Efficient In-Memory Computing Acceleration of Deep Neural Networks

Jian Meng; Li Yang; Xiaochen Peng; Shimeng Yu; Deliang Fan; Jae Sun Seo

doi:10.1109/TCSII.2021.3069011

Structured Pruning of RRAM Crossbars for Efficient In-Memory Computing Acceleration of Deep Neural Networks

Jian Meng, Li Yang, Xiaochen Peng, Shimeng Yu, Deliang Fan, Jae Sun Seo

Research output: Contribution to journal › Article › peer-review

28 Scopus citations

Abstract

The high computational complexity and a large number of parameters of deep neural networks (DNNs) become the most intensive burden of deep learning hardware design, limiting efficient storage and deployment. With the advantage of high-density storage, non-volatility, and low energy consumption, resistive RAM (RRAM) crossbar based in-memory computing (IMC) has emerged as a promising technique for DNN acceleration. To fully exploit crossbar-based IMC efficiency, a systematic compression design that considers both hardware and algorithm is necessary. In this brief, we present a system-level design considering the low precision weight and activation, structured pruning, and RRAM crossbar mapping. The proposed multi-group Lasso algorithm and hardware implementations have been evaluated on ResNet/VGG models for CIFAR-10/ImageNet datasets. With the fully quantized 4-bit ResNet-18 for CIFAR-10, we achieve up to 65.4times compression compared to full-precision software baseline, and 7times energy reduction compared to the 4-bit unpruned RRAM IMC hardware with 1.1% accuracy loss. For the fully quantized 4-bit ResNet-18 model for ImageNet dataset, we achieve up to 10.9times structured compression with 1.9% accuracy degradation.

Original language	English (US)
Article number	9387391
Pages (from-to)	1576-1580
Number of pages	5
Journal	IEEE Transactions on Circuits and Systems II: Express Briefs
Volume	68
Issue number	5
DOIs	https://doi.org/10.1109/TCSII.2021.3069011
State	Published - May 2021

Keywords

Convolutional neural networks
hardware accelerator
in-memory computing
resistive RAM
structured pruning

ASJC Scopus subject areas

Electrical and Electronic Engineering

Access to Document

10.1109/TCSII.2021.3069011

Cite this

@article{392927bfe28f415a985031aff96127eb,

title = "Structured Pruning of RRAM Crossbars for Efficient In-Memory Computing Acceleration of Deep Neural Networks",

abstract = "The high computational complexity and a large number of parameters of deep neural networks (DNNs) become the most intensive burden of deep learning hardware design, limiting efficient storage and deployment. With the advantage of high-density storage, non-volatility, and low energy consumption, resistive RAM (RRAM) crossbar based in-memory computing (IMC) has emerged as a promising technique for DNN acceleration. To fully exploit crossbar-based IMC efficiency, a systematic compression design that considers both hardware and algorithm is necessary. In this brief, we present a system-level design considering the low precision weight and activation, structured pruning, and RRAM crossbar mapping. The proposed multi-group Lasso algorithm and hardware implementations have been evaluated on ResNet/VGG models for CIFAR-10/ImageNet datasets. With the fully quantized 4-bit ResNet-18 for CIFAR-10, we achieve up to 65.4times compression compared to full-precision software baseline, and 7times energy reduction compared to the 4-bit unpruned RRAM IMC hardware with 1.1% accuracy loss. For the fully quantized 4-bit ResNet-18 model for ImageNet dataset, we achieve up to 10.9times structured compression with 1.9% accuracy degradation.",

keywords = "Convolutional neural networks, hardware accelerator, in-memory computing, resistive RAM, structured pruning",

author = "Jian Meng and Li Yang and Xiaochen Peng and Shimeng Yu and Deliang Fan and Seo, {Jae Sun}",

note = "Publisher Copyright: {\textcopyright} 2004-2012 IEEE.",

year = "2021",

month = may,

doi = "10.1109/TCSII.2021.3069011",

language = "English (US)",

volume = "68",

pages = "1576--1580",

journal = "IEEE Transactions on Circuits and Systems II: Express Briefs",

issn = "1549-7747",

number = "5",

}

TY - JOUR

T1 - Structured Pruning of RRAM Crossbars for Efficient In-Memory Computing Acceleration of Deep Neural Networks

AU - Meng, Jian

AU - Yang, Li

AU - Peng, Xiaochen

AU - Yu, Shimeng

AU - Fan, Deliang

AU - Seo, Jae Sun

PY - 2021/5

Y1 - 2021/5

N2 - The high computational complexity and a large number of parameters of deep neural networks (DNNs) become the most intensive burden of deep learning hardware design, limiting efficient storage and deployment. With the advantage of high-density storage, non-volatility, and low energy consumption, resistive RAM (RRAM) crossbar based in-memory computing (IMC) has emerged as a promising technique for DNN acceleration. To fully exploit crossbar-based IMC efficiency, a systematic compression design that considers both hardware and algorithm is necessary. In this brief, we present a system-level design considering the low precision weight and activation, structured pruning, and RRAM crossbar mapping. The proposed multi-group Lasso algorithm and hardware implementations have been evaluated on ResNet/VGG models for CIFAR-10/ImageNet datasets. With the fully quantized 4-bit ResNet-18 for CIFAR-10, we achieve up to 65.4times compression compared to full-precision software baseline, and 7times energy reduction compared to the 4-bit unpruned RRAM IMC hardware with 1.1% accuracy loss. For the fully quantized 4-bit ResNet-18 model for ImageNet dataset, we achieve up to 10.9times structured compression with 1.9% accuracy degradation.

AB - The high computational complexity and a large number of parameters of deep neural networks (DNNs) become the most intensive burden of deep learning hardware design, limiting efficient storage and deployment. With the advantage of high-density storage, non-volatility, and low energy consumption, resistive RAM (RRAM) crossbar based in-memory computing (IMC) has emerged as a promising technique for DNN acceleration. To fully exploit crossbar-based IMC efficiency, a systematic compression design that considers both hardware and algorithm is necessary. In this brief, we present a system-level design considering the low precision weight and activation, structured pruning, and RRAM crossbar mapping. The proposed multi-group Lasso algorithm and hardware implementations have been evaluated on ResNet/VGG models for CIFAR-10/ImageNet datasets. With the fully quantized 4-bit ResNet-18 for CIFAR-10, we achieve up to 65.4times compression compared to full-precision software baseline, and 7times energy reduction compared to the 4-bit unpruned RRAM IMC hardware with 1.1% accuracy loss. For the fully quantized 4-bit ResNet-18 model for ImageNet dataset, we achieve up to 10.9times structured compression with 1.9% accuracy degradation.

KW - Convolutional neural networks

KW - hardware accelerator

KW - in-memory computing

KW - resistive RAM

KW - structured pruning

UR - http://www.scopus.com/inward/record.url?scp=85103271159&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85103271159&partnerID=8YFLogxK

U2 - 10.1109/TCSII.2021.3069011

DO - 10.1109/TCSII.2021.3069011

M3 - Article

AN - SCOPUS:85103271159

SN - 1549-7747

VL - 68

SP - 1576

EP - 1580

JO - IEEE Transactions on Circuits and Systems II: Express Briefs

JF - IEEE Transactions on Circuits and Systems II: Express Briefs

IS - 5

M1 - 9387391

ER -

Structured Pruning of RRAM Crossbars for Efficient In-Memory Computing Acceleration of Deep Neural Networks

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this