A 8.93-tops/w lstm recurrent neural network accelerator featuring hierarchical coarse-grain sparsity with all parameters stored on-chip

Deepak Kadetotad; Visar Berisha; Chaitali Chakrabarti; Jae Sun Seo

doi:10.1109/LSSC.2019.2936761

A 8.93-tops/w lstm recurrent neural network accelerator featuring hierarchical coarse-grain sparsity with all parameters stored on-chip

Deepak Kadetotad, Visar Berisha, Chaitali Chakrabarti, Jae Sun Seo

Research output: Contribution to journal › Article › peer-review

13 Scopus citations

Abstract

Long short-term memory (LSTM) networks are widely used for speech applications but pose difficulties for efficient implementation on hardware due to large weight storage requirements. We present an energy-efficient LSTM recurrent neural network (RNN) accelerator, featuring an algorithm-hardware co-optimized memory compression technique called hierarchical coarse-grain sparsity (HCGS). Aided by HCGS-based block-wise recursive weight compression, we demonstrate LSTM networks with up to 16× fewer weights while achieving minimal accuracy loss. The prototype chip fabricated in 65-nm LP CMOS achieves 8.93/7.22 TOPS/W for 2-/3-layer LSTM RNNs trained with HCGS for TIMIT/TED-LIUM corpora.

Original language	English (US)
Article number	8877949
Pages (from-to)	119-122
Number of pages	4
Journal	IEEE Solid-State Circuits Letters
Volume	2
Issue number	9
DOIs	https://doi.org/10.1109/LSSC.2019.2936761
State	Published - Sep 2019

Keywords

Hardware accelerator
Long short-term memory (LSTM)
Speech recognition
Structured sparsity weight compression

ASJC Scopus subject areas

Electrical and Electronic Engineering

Access to Document

10.1109/LSSC.2019.2936761

Cite this

@article{8bd8fc4785094210bfd566240d1cdb60,

title = "A 8.93-tops/w lstm recurrent neural network accelerator featuring hierarchical coarse-grain sparsity with all parameters stored on-chip",

abstract = "Long short-term memory (LSTM) networks are widely used for speech applications but pose difficulties for efficient implementation on hardware due to large weight storage requirements. We present an energy-efficient LSTM recurrent neural network (RNN) accelerator, featuring an algorithm-hardware co-optimized memory compression technique called hierarchical coarse-grain sparsity (HCGS). Aided by HCGS-based block-wise recursive weight compression, we demonstrate LSTM networks with up to 16× fewer weights while achieving minimal accuracy loss. The prototype chip fabricated in 65-nm LP CMOS achieves 8.93/7.22 TOPS/W for 2-/3-layer LSTM RNNs trained with HCGS for TIMIT/TED-LIUM corpora. ",

keywords = "Hardware accelerator, Long short-term memory (LSTM), Speech recognition, Structured sparsity weight compression",

author = "Deepak Kadetotad and Visar Berisha and Chaitali Chakrabarti and Seo, {Jae Sun}",

note = "Funding Information: Manuscript received May 30, 2019; revised August 8, 2019; accepted August 11, 2019. Date of publication October 15, 2019; date of current version October 15, 2019. This article was approved by Associate Editor Tobias Gemmeke. This work was supported in part by NSF under Grant 1652866, in part by Samsung, in part by ONR, and in part by C-BRIC, one of six centers in JUMP, an SRC program sponsored by DARPA. (Corresponding author: Deepak Kadetotad.) The authors are with the School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85281 USA (e-mail: dkadetot@asu.edu). Digital Object Identifier 10.1109/LSSC.2019.2936761 Publisher Copyright: {\textcopyright} 2018 IEEE.",

year = "2019",

month = sep,

doi = "10.1109/LSSC.2019.2936761",

language = "English (US)",

volume = "2",

pages = "119--122",

journal = "IEEE Solid-State Circuits Letters",

issn = "2573-9603",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "9",

}

TY - JOUR

T1 - A 8.93-tops/w lstm recurrent neural network accelerator featuring hierarchical coarse-grain sparsity with all parameters stored on-chip

AU - Kadetotad, Deepak

AU - Berisha, Visar

AU - Chakrabarti, Chaitali

AU - Seo, Jae Sun

N1 - Funding Information: Manuscript received May 30, 2019; revised August 8, 2019; accepted August 11, 2019. Date of publication October 15, 2019; date of current version October 15, 2019. This article was approved by Associate Editor Tobias Gemmeke. This work was supported in part by NSF under Grant 1652866, in part by Samsung, in part by ONR, and in part by C-BRIC, one of six centers in JUMP, an SRC program sponsored by DARPA. (Corresponding author: Deepak Kadetotad.) The authors are with the School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ 85281 USA (e-mail: dkadetot@asu.edu). Digital Object Identifier 10.1109/LSSC.2019.2936761 Publisher Copyright: © 2018 IEEE.

PY - 2019/9

Y1 - 2019/9

N2 - Long short-term memory (LSTM) networks are widely used for speech applications but pose difficulties for efficient implementation on hardware due to large weight storage requirements. We present an energy-efficient LSTM recurrent neural network (RNN) accelerator, featuring an algorithm-hardware co-optimized memory compression technique called hierarchical coarse-grain sparsity (HCGS). Aided by HCGS-based block-wise recursive weight compression, we demonstrate LSTM networks with up to 16× fewer weights while achieving minimal accuracy loss. The prototype chip fabricated in 65-nm LP CMOS achieves 8.93/7.22 TOPS/W for 2-/3-layer LSTM RNNs trained with HCGS for TIMIT/TED-LIUM corpora.

AB - Long short-term memory (LSTM) networks are widely used for speech applications but pose difficulties for efficient implementation on hardware due to large weight storage requirements. We present an energy-efficient LSTM recurrent neural network (RNN) accelerator, featuring an algorithm-hardware co-optimized memory compression technique called hierarchical coarse-grain sparsity (HCGS). Aided by HCGS-based block-wise recursive weight compression, we demonstrate LSTM networks with up to 16× fewer weights while achieving minimal accuracy loss. The prototype chip fabricated in 65-nm LP CMOS achieves 8.93/7.22 TOPS/W for 2-/3-layer LSTM RNNs trained with HCGS for TIMIT/TED-LIUM corpora.

KW - Hardware accelerator

KW - Long short-term memory (LSTM)

KW - Speech recognition

KW - Structured sparsity weight compression

UR - http://www.scopus.com/inward/record.url?scp=85092134862&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85092134862&partnerID=8YFLogxK

U2 - 10.1109/LSSC.2019.2936761

DO - 10.1109/LSSC.2019.2936761

M3 - Article

AN - SCOPUS:85092134862

SN - 2573-9603

VL - 2

SP - 119

EP - 122

JO - IEEE Solid-State Circuits Letters

JF - IEEE Solid-State Circuits Letters

IS - 9

M1 - 8877949

ER -

A 8.93-tops/w lstm recurrent neural network accelerator featuring hierarchical coarse-grain sparsity with all parameters stored on-chip

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this