TY - GEN
T1 - IMCE
T2 - 23rd Asia and South Pacific Design Automation Conference, ASP-DAC 2018
AU - Angizi, Shaahin
AU - He, Zhezhi
AU - Parveen, Farhana
AU - Fan, Deliang
N1 - Funding Information:
This material is based upon work supported in part by the National Science Foundation under Grant No.1740126.
Funding Information:
ACKNOWLEDGEMENTS This material is based upon work supported in part by the National Science Foundation under Grant No.1740126.
Publisher Copyright:
© 2018 IEEE.
PY - 2018/2/20
Y1 - 2018/2/20
N2 - In this paper, we pave a novel way towards the concept of bit-wise In-Memory Convolution Engine (IMCE) that could implement the dominant convolution computation of Deep Convolutional Neural Networks (CNN) within memory. IMCE employs parallel computational memory sub-array as a fundamental unit based on our proposed Spin Orbit Torque Magnetic Random Access Memory (SOT-MRAM) design. Then, we propose an accelerator system architecture based on IMCE to efficiently process low bit-width CNNs. This architecture can be leveraged to greatly reduce energy consumption dealing with convolutional layers and also accelerate CNN inference. The device to architecture co-simulation results show that the proposed system architecture can process low bit-width AlexNet on ImageNet data-set favorably with 785.25μJ/img, which consumes ∼3× less energy than that of recent RRAM based counterpart. Besides, the chip area is ∼4× smaller.
AB - In this paper, we pave a novel way towards the concept of bit-wise In-Memory Convolution Engine (IMCE) that could implement the dominant convolution computation of Deep Convolutional Neural Networks (CNN) within memory. IMCE employs parallel computational memory sub-array as a fundamental unit based on our proposed Spin Orbit Torque Magnetic Random Access Memory (SOT-MRAM) design. Then, we propose an accelerator system architecture based on IMCE to efficiently process low bit-width CNNs. This architecture can be leveraged to greatly reduce energy consumption dealing with convolutional layers and also accelerate CNN inference. The device to architecture co-simulation results show that the proposed system architecture can process low bit-width AlexNet on ImageNet data-set favorably with 785.25μJ/img, which consumes ∼3× less energy than that of recent RRAM based counterpart. Besides, the chip area is ∼4× smaller.
UR - http://www.scopus.com/inward/record.url?scp=85045299946&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85045299946&partnerID=8YFLogxK
U2 - 10.1109/ASPDAC.2018.8297291
DO - 10.1109/ASPDAC.2018.8297291
M3 - Conference contribution
AN - SCOPUS:85045299946
T3 - Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC
SP - 111
EP - 116
BT - ASP-DAC 2018 - 23rd Asia and South Pacific Design Automation Conference, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 22 January 2018 through 25 January 2018
ER -