An 8.93 TOPS/W LSTM Recurrent Neural Network Accelerator Featuring Hierarchical Coarse-Grain Sparsity for On-Device Speech Recognition

Deepak Kadetotad; Shihui Yin; Visar Berisha; Chaitali Chakrabarti; Jae Sun Seo

doi:10.1109/JSSC.2020.2992900

An 8.93 TOPS/W LSTM Recurrent Neural Network Accelerator Featuring Hierarchical Coarse-Grain Sparsity for On-Device Speech Recognition

Deepak Kadetotad, Shihui Yin, Visar Berisha, Chaitali Chakrabarti, Jae Sun Seo

Research output: Contribution to journal › Article › peer-review

36 Scopus citations

Abstract

Long short-term memory (LSTM) is a type of recurrent neural networks (RNNs), which is widely used for time-series data and speech applications, due to its high accuracy on such tasks. However, LSTMs pose difficulties for efficient hardware implementation because they require a large amount of weight storage and exhibit computation complexity. Prior works have proposed compression techniques to alleviate the storage/computation requirements of LSTMs but elementwise sparsity schemes incur sizable index memory overhead and structured compression techniques report limited compression ratios. In this article, we present an energy-efficient LSTM RNN accelerator, featuring an algorithm-hardware co-optimized memory compression technique called hierarchical coarse-grain sparsity (HCGS). Aided by the HCGS-based blockwise recursive weight compression, we demonstrate LSTM networks with up to 16 × fewer weights while achieving minimal error rate degradation. The prototype chip fabricated in 65-nm LP CMOS achieves up to 8.93 TOPS/W for real-time speech recognition using compressed LSTMs based on HCGS. HCGS-based LSTMs have demonstrated energy-efficient speech recognition with low error rates for TIMIT, TED-LIUM, and LibriSpeech data sets.

Original language	English (US)
Article number	9094675
Pages (from-to)	1877-1887
Number of pages	11
Journal	IEEE Journal of Solid-State Circuits
Volume	55
Issue number	7
DOIs	https://doi.org/10.1109/JSSC.2020.2992900
State	Published - Jul 2020

Keywords

Hardware accelerator
long short-term memory (LSTM)
speech recognition
structured sparsity
weight compression

ASJC Scopus subject areas

Electrical and Electronic Engineering

Access to Document

10.1109/JSSC.2020.2992900

Cite this

@article{42afb2d1cf1844ea997e2d95d80989fe,

title = "An 8.93 TOPS/W LSTM Recurrent Neural Network Accelerator Featuring Hierarchical Coarse-Grain Sparsity for On-Device Speech Recognition",

abstract = "Long short-term memory (LSTM) is a type of recurrent neural networks (RNNs), which is widely used for time-series data and speech applications, due to its high accuracy on such tasks. However, LSTMs pose difficulties for efficient hardware implementation because they require a large amount of weight storage and exhibit computation complexity. Prior works have proposed compression techniques to alleviate the storage/computation requirements of LSTMs but elementwise sparsity schemes incur sizable index memory overhead and structured compression techniques report limited compression ratios. In this article, we present an energy-efficient LSTM RNN accelerator, featuring an algorithm-hardware co-optimized memory compression technique called hierarchical coarse-grain sparsity (HCGS). Aided by the HCGS-based blockwise recursive weight compression, we demonstrate LSTM networks with up to 16 × fewer weights while achieving minimal error rate degradation. The prototype chip fabricated in 65-nm LP CMOS achieves up to 8.93 TOPS/W for real-time speech recognition using compressed LSTMs based on HCGS. HCGS-based LSTMs have demonstrated energy-efficient speech recognition with low error rates for TIMIT, TED-LIUM, and LibriSpeech data sets.",

keywords = "Hardware accelerator, long short-term memory (LSTM), speech recognition, structured sparsity, weight compression",

author = "Deepak Kadetotad and Shihui Yin and Visar Berisha and Chaitali Chakrabarti and Seo, {Jae Sun}",

note = "Funding Information: Manuscript received December 20, 2019; revised March 13, 2020 and April 29, 2020; accepted April 30, 2020. Date of publication May 18, 2020; date of current version June 29, 2020. This article was approved by Associate Editor Sylvain Clerc. This work was supported in part by NSF under Grant 1652866, in part by Samsung, in part by the Office of Naval Research (ONR), and in part by the Center for Brain-inspired Computing (C-BRIC), one of six centers in the Joint University Microelectronics Program (JUMP), an Semiconductor Research Corporation (SRC) program sponsored by the Defense Advanced Research Projects Agency (DARPA). (Corresponding author: Jae-sun Seo.) Deepak Kadetotad was with the School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85281 USA. He is now with Starkey Hearing Technologies, Eden Prairie, MN 55344 USA. Publisher Copyright: {\textcopyright} 1966-2012 IEEE.",

year = "2020",

month = jul,

doi = "10.1109/JSSC.2020.2992900",

language = "English (US)",

volume = "55",

pages = "1877--1887",

journal = "IEEE Journal of Solid-State Circuits",

issn = "0018-9200",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "7",

}

TY - JOUR

T1 - An 8.93 TOPS/W LSTM Recurrent Neural Network Accelerator Featuring Hierarchical Coarse-Grain Sparsity for On-Device Speech Recognition

AU - Kadetotad, Deepak

AU - Yin, Shihui

AU - Berisha, Visar

AU - Chakrabarti, Chaitali

AU - Seo, Jae Sun

N1 - Funding Information: Manuscript received December 20, 2019; revised March 13, 2020 and April 29, 2020; accepted April 30, 2020. Date of publication May 18, 2020; date of current version June 29, 2020. This article was approved by Associate Editor Sylvain Clerc. This work was supported in part by NSF under Grant 1652866, in part by Samsung, in part by the Office of Naval Research (ONR), and in part by the Center for Brain-inspired Computing (C-BRIC), one of six centers in the Joint University Microelectronics Program (JUMP), an Semiconductor Research Corporation (SRC) program sponsored by the Defense Advanced Research Projects Agency (DARPA). (Corresponding author: Jae-sun Seo.) Deepak Kadetotad was with the School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85281 USA. He is now with Starkey Hearing Technologies, Eden Prairie, MN 55344 USA. Publisher Copyright: © 1966-2012 IEEE.

PY - 2020/7

Y1 - 2020/7

N2 - Long short-term memory (LSTM) is a type of recurrent neural networks (RNNs), which is widely used for time-series data and speech applications, due to its high accuracy on such tasks. However, LSTMs pose difficulties for efficient hardware implementation because they require a large amount of weight storage and exhibit computation complexity. Prior works have proposed compression techniques to alleviate the storage/computation requirements of LSTMs but elementwise sparsity schemes incur sizable index memory overhead and structured compression techniques report limited compression ratios. In this article, we present an energy-efficient LSTM RNN accelerator, featuring an algorithm-hardware co-optimized memory compression technique called hierarchical coarse-grain sparsity (HCGS). Aided by the HCGS-based blockwise recursive weight compression, we demonstrate LSTM networks with up to 16 × fewer weights while achieving minimal error rate degradation. The prototype chip fabricated in 65-nm LP CMOS achieves up to 8.93 TOPS/W for real-time speech recognition using compressed LSTMs based on HCGS. HCGS-based LSTMs have demonstrated energy-efficient speech recognition with low error rates for TIMIT, TED-LIUM, and LibriSpeech data sets.

AB - Long short-term memory (LSTM) is a type of recurrent neural networks (RNNs), which is widely used for time-series data and speech applications, due to its high accuracy on such tasks. However, LSTMs pose difficulties for efficient hardware implementation because they require a large amount of weight storage and exhibit computation complexity. Prior works have proposed compression techniques to alleviate the storage/computation requirements of LSTMs but elementwise sparsity schemes incur sizable index memory overhead and structured compression techniques report limited compression ratios. In this article, we present an energy-efficient LSTM RNN accelerator, featuring an algorithm-hardware co-optimized memory compression technique called hierarchical coarse-grain sparsity (HCGS). Aided by the HCGS-based blockwise recursive weight compression, we demonstrate LSTM networks with up to 16 × fewer weights while achieving minimal error rate degradation. The prototype chip fabricated in 65-nm LP CMOS achieves up to 8.93 TOPS/W for real-time speech recognition using compressed LSTMs based on HCGS. HCGS-based LSTMs have demonstrated energy-efficient speech recognition with low error rates for TIMIT, TED-LIUM, and LibriSpeech data sets.

KW - Hardware accelerator

KW - long short-term memory (LSTM)

KW - speech recognition

KW - structured sparsity

KW - weight compression

UR - http://www.scopus.com/inward/record.url?scp=85091104443&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85091104443&partnerID=8YFLogxK

U2 - 10.1109/JSSC.2020.2992900

DO - 10.1109/JSSC.2020.2992900

M3 - Article

AN - SCOPUS:85091104443

SN - 0018-9200

VL - 55

SP - 1877

EP - 1887

JO - IEEE Journal of Solid-State Circuits

JF - IEEE Journal of Solid-State Circuits

IS - 7

M1 - 9094675

ER -

An 8.93 TOPS/W LSTM Recurrent Neural Network Accelerator Featuring Hierarchical Coarse-Grain Sparsity for On-Device Speech Recognition

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this