TY - GEN
T1 - A 8.93-TOPS/W LSTM Recurrent Neural Network Accelerator Featuring Hierarchical Coarse-Grain Sparsity with All Parameters Stored On-Chip
AU - Kadetotad, Deepak
AU - Berisha, Visar
AU - Chakrabarti, Chaitali
AU - Seo, Jae Sun
N1 - Funding Information:
Manuscript received May 30, 2019; revised August 8, 2019; accepted August 11, 2019. Date of publication October 15, 2019; date of current version October 15, 2019. This article was approved by Associate Editor Tobias Gemmeke. This work was supported in part by NSF under Grant 1652866, in part by Samsung, in part by ONR, and in part by C-BRIC, one of six centers in JUMP, an SRC program sponsored by DARPA. (Corresponding author: Deepak Kadetotad.) The authors are with the School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85281 USA (e-mail: dkadetot@asu.edu). Digital Object Identifier 10.1109/LSSC.2019.2936761
Funding Information:
This work was supported in part by NSF under Grant 1652866,in part by Samsung,in part by ONR,and in part by C-BRIC,one of six centers in JUMP,an SRC program sponsored by DARPA.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/9
Y1 - 2019/9
N2 - Long short-term memory (LSTM) networks are widely used for speech applications but pose difficulties for efficient implementation on hardware due to large weight storage requirements. We present an energy-efficient LSTM recurrent neural network (RNN) accelerator,featuring an algorithm-hardware co-optimized memory compression technique called hierarchical coarse-grain sparsity (HCGS). Aided by HCGS-based block-wise recursive weight compression,we demonstrate LSTM networks with up to 16× fewer weights while achieving minimal accuracy loss. The prototype chip fabricated in 65-nm LP CMOS achieves 8.93/7.22 TOPS/W for 2-/3-layer LSTM RNNs trained with HCGS for TIMIT/TED-LIUM corpora.
AB - Long short-term memory (LSTM) networks are widely used for speech applications but pose difficulties for efficient implementation on hardware due to large weight storage requirements. We present an energy-efficient LSTM recurrent neural network (RNN) accelerator,featuring an algorithm-hardware co-optimized memory compression technique called hierarchical coarse-grain sparsity (HCGS). Aided by HCGS-based block-wise recursive weight compression,we demonstrate LSTM networks with up to 16× fewer weights while achieving minimal accuracy loss. The prototype chip fabricated in 65-nm LP CMOS achieves 8.93/7.22 TOPS/W for 2-/3-layer LSTM RNNs trained with HCGS for TIMIT/TED-LIUM corpora.
KW - Hardware accelerator
KW - long short-term memory (LSTM)
KW - speech recognition
KW - structured sparsity weight compression
UR - http://www.scopus.com/inward/record.url?scp=85075908969&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85075908969&partnerID=8YFLogxK
U2 - 10.1109/ESSCIRC.2019.8902809
DO - 10.1109/ESSCIRC.2019.8902809
M3 - Conference contribution
AN - SCOPUS:85075908969
T3 - ESSCIRC 2019 - IEEE 45th European Solid State Circuits Conference
SP - 119
EP - 122
BT - ESSCIRC 2019 - IEEE 45th European Solid State Circuits Conference
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 45th IEEE European Solid State Circuits Conference, ESSCIRC 2019
Y2 - 23 September 2019 through 26 September 2019
ER -