TY - JOUR
T1 - An 8.93 TOPS/W LSTM Recurrent Neural Network Accelerator Featuring Hierarchical Coarse-Grain Sparsity for On-Device Speech Recognition
AU - Kadetotad, Deepak
AU - Yin, Shihui
AU - Berisha, Visar
AU - Chakrabarti, Chaitali
AU - Seo, Jae Sun
N1 - Funding Information:
Manuscript received December 20, 2019; revised March 13, 2020 and April 29, 2020; accepted April 30, 2020. Date of publication May 18, 2020; date of current version June 29, 2020. This article was approved by Associate Editor Sylvain Clerc. This work was supported in part by NSF under Grant 1652866, in part by Samsung, in part by the Office of Naval Research (ONR), and in part by the Center for Brain-inspired Computing (C-BRIC), one of six centers in the Joint University Microelectronics Program (JUMP), an Semiconductor Research Corporation (SRC) program sponsored by the Defense Advanced Research Projects Agency (DARPA). (Corresponding author: Jae-sun Seo.) Deepak Kadetotad was with the School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85281 USA. He is now with Starkey Hearing Technologies, Eden Prairie, MN 55344 USA.
Publisher Copyright:
© 1966-2012 IEEE.
PY - 2020/7
Y1 - 2020/7
N2 - Long short-term memory (LSTM) is a type of recurrent neural networks (RNNs), which is widely used for time-series data and speech applications, due to its high accuracy on such tasks. However, LSTMs pose difficulties for efficient hardware implementation because they require a large amount of weight storage and exhibit computation complexity. Prior works have proposed compression techniques to alleviate the storage/computation requirements of LSTMs but elementwise sparsity schemes incur sizable index memory overhead and structured compression techniques report limited compression ratios. In this article, we present an energy-efficient LSTM RNN accelerator, featuring an algorithm-hardware co-optimized memory compression technique called hierarchical coarse-grain sparsity (HCGS). Aided by the HCGS-based blockwise recursive weight compression, we demonstrate LSTM networks with up to 16 × fewer weights while achieving minimal error rate degradation. The prototype chip fabricated in 65-nm LP CMOS achieves up to 8.93 TOPS/W for real-time speech recognition using compressed LSTMs based on HCGS. HCGS-based LSTMs have demonstrated energy-efficient speech recognition with low error rates for TIMIT, TED-LIUM, and LibriSpeech data sets.
AB - Long short-term memory (LSTM) is a type of recurrent neural networks (RNNs), which is widely used for time-series data and speech applications, due to its high accuracy on such tasks. However, LSTMs pose difficulties for efficient hardware implementation because they require a large amount of weight storage and exhibit computation complexity. Prior works have proposed compression techniques to alleviate the storage/computation requirements of LSTMs but elementwise sparsity schemes incur sizable index memory overhead and structured compression techniques report limited compression ratios. In this article, we present an energy-efficient LSTM RNN accelerator, featuring an algorithm-hardware co-optimized memory compression technique called hierarchical coarse-grain sparsity (HCGS). Aided by the HCGS-based blockwise recursive weight compression, we demonstrate LSTM networks with up to 16 × fewer weights while achieving minimal error rate degradation. The prototype chip fabricated in 65-nm LP CMOS achieves up to 8.93 TOPS/W for real-time speech recognition using compressed LSTMs based on HCGS. HCGS-based LSTMs have demonstrated energy-efficient speech recognition with low error rates for TIMIT, TED-LIUM, and LibriSpeech data sets.
KW - Hardware accelerator
KW - long short-term memory (LSTM)
KW - speech recognition
KW - structured sparsity
KW - weight compression
UR - http://www.scopus.com/inward/record.url?scp=85091104443&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85091104443&partnerID=8YFLogxK
U2 - 10.1109/JSSC.2020.2992900
DO - 10.1109/JSSC.2020.2992900
M3 - Article
AN - SCOPUS:85091104443
SN - 0018-9200
VL - 55
SP - 1877
EP - 1887
JO - IEEE Journal of Solid-State Circuits
JF - IEEE Journal of Solid-State Circuits
IS - 7
M1 - 9094675
ER -