TY - GEN
T1 - A Parallel RRAM Synaptic Array Architecture for Energy-Efficient Recurrent Neural Networks
AU - Yin, Shihui
AU - Sun, Xiaoyu
AU - Yu, Shimeng
AU - Seo, Jae-sun
AU - Chakrabarti, Chaitali
N1 - Funding Information:
This work is in part supported by NSF/SRC E2CDA program under NSF grant 1740225 and SRC contract 2018-NC-2762, NSF grant 1652866, and C-BRIC, one of six centers in JUMP, a SRC program sponsored by DARPA.
Funding Information:
ACKNOWLEDGMENT This work is in part supported by NSF/SRC E2CDA program under NSF grant 1740225 and SRC contract 2018-NC-2762, NSF grant 1652866, and C-BRIC, one of six centers in JUMP, a SRC program sponsored by DARPA.
PY - 2018/12/31
Y1 - 2018/12/31
N2 - Recurrent neural networks (RNNs) provide excellent performance on applications with sequential data such as speech recognition. On-chip implementation of RNNs is difficult due to the significantly large number of parameters and computations. In this work, we first present a training method for LSTM model for language modeling on Penn Treebank dataset with binary weights and multi-bit activations and then map it onto a fully parallel RRAM array architecture ("XNOR-RRAM"). An energy-efficient XNOR-RRAM array based system for LSTM RNN is implemented and benchmarked on Penn Treebank dataset. Our results show that 4-bit activation precision can provide a near-optimal perplexity of 115.3 with an estimated energy-efficiency of 27 TOPS/W.
AB - Recurrent neural networks (RNNs) provide excellent performance on applications with sequential data such as speech recognition. On-chip implementation of RNNs is difficult due to the significantly large number of parameters and computations. In this work, we first present a training method for LSTM model for language modeling on Penn Treebank dataset with binary weights and multi-bit activations and then map it onto a fully parallel RRAM array architecture ("XNOR-RRAM"). An energy-efficient XNOR-RRAM array based system for LSTM RNN is implemented and benchmarked on Penn Treebank dataset. Our results show that 4-bit activation precision can provide a near-optimal perplexity of 115.3 with an estimated energy-efficiency of 27 TOPS/W.
UR - http://www.scopus.com/inward/record.url?scp=85061355856&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85061355856&partnerID=8YFLogxK
U2 - 10.1109/SiPS.2018.8598445
DO - 10.1109/SiPS.2018.8598445
M3 - Conference contribution
AN - SCOPUS:85061355856
T3 - IEEE Workshop on Signal Processing Systems, SiPS: Design and Implementation
SP - 13
EP - 18
BT - Proceedings of the IEEE Workshop on Signal Processing Systems, SiPS 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE Workshop on Signal Processing Systems, SiPS 2018
Y2 - 21 October 2018 through 24 October 2018
ER -