Compressing LSTM networks with hierarchical coarse-grain sparsity

Deepak Kadetotad; Jian Meng; Visar Berisha; Chaitali Chakrabarti; Jae Sun Seo

doi:10.21437/Interspeech.2020-1270

Compressing LSTM networks with hierarchical coarse-grain sparsity

Deepak Kadetotad, Jian Meng, Visar Berisha, Chaitali Chakrabarti, Jae Sun Seo

Research output: Contribution to journal › Conference article › peer-review

1 Scopus citations

Abstract

The long short-term memory (LSTM) network is one of the most widely used recurrent neural networks (RNNs) for automatic speech recognition (ASR), but is parametrized by millions of parameters. This makes it prohibitive for memory-constrained hardware accelerators as the storage demand causes higher dependence on off-chip memory, which bottlenecks latency and power. In this paper, we propose a new LSTM training technique based on hierarchical coarse-grain sparsity (HCGS), which enforces hierarchical structured sparsity by randomly dropping static block-wise connections between layers. HCGS maintains the same hierarchical structured sparsity throughout training and inference; this reduces weight storage for both training and inference hardware systems. We also jointly optimize in-training quantization with HCGS on 2-/3-layer LSTM networks for the TIMIT and TED-LIUM corpora. With 16× structured compression and 6-bit weight precision, we achieved a phoneme error rate (PER) of 16.9% for TIMIT and a word error rate (WER) of 18.9% for TED-LIUM, showing the best trade-off between error rate and LSTM memory compression compared to prior works.

Original language	English (US)
Pages (from-to)	21-25
Number of pages	5
Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume	2020-October
DOIs	https://doi.org/10.21437/Interspeech.2020-1270
State	Published - 2020
Event	21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China Duration: Oct 25 2020 → Oct 29 2020

Keywords

Long short-term memory
Speech recognition
Structured sparsity
Weight compression

ASJC Scopus subject areas

Language and Linguistics
Human-Computer Interaction
Signal Processing
Software
Modeling and Simulation

Access to Document

10.21437/Interspeech.2020-1270

Cite this

@article{48a82a35bbbc45a7b0ab09bb3265f9cf,

title = "Compressing LSTM networks with hierarchical coarse-grain sparsity",

abstract = "The long short-term memory (LSTM) network is one of the most widely used recurrent neural networks (RNNs) for automatic speech recognition (ASR), but is parametrized by millions of parameters. This makes it prohibitive for memory-constrained hardware accelerators as the storage demand causes higher dependence on off-chip memory, which bottlenecks latency and power. In this paper, we propose a new LSTM training technique based on hierarchical coarse-grain sparsity (HCGS), which enforces hierarchical structured sparsity by randomly dropping static block-wise connections between layers. HCGS maintains the same hierarchical structured sparsity throughout training and inference; this reduces weight storage for both training and inference hardware systems. We also jointly optimize in-training quantization with HCGS on 2-/3-layer LSTM networks for the TIMIT and TED-LIUM corpora. With 16× structured compression and 6-bit weight precision, we achieved a phoneme error rate (PER) of 16.9% for TIMIT and a word error rate (WER) of 18.9% for TED-LIUM, showing the best trade-off between error rate and LSTM memory compression compared to prior works.",

keywords = "Long short-term memory, Speech recognition, Structured sparsity, Weight compression",

author = "Deepak Kadetotad and Jian Meng and Visar Berisha and Chaitali Chakrabarti and Seo, {Jae Sun}",

note = "Funding Information: This work was in part supported by NSF grant 1652866, Samsung, ONR, and C-BRIC, one of six centers in JUMP, a SRC program sponsored by DARPA. Funding Information: This work was in part supported by NSF grant 1652866, Sam-sung, ONR, and C-BRIC, one of six centers in JUMP, a SRC program sponsored by DARPA. Publisher Copyright: Copyright {\textcopyright} 2020 ISCA; 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 ; Conference date: 25-10-2020 Through 29-10-2020",

year = "2020",

doi = "10.21437/Interspeech.2020-1270",

language = "English (US)",

volume = "2020-October",

pages = "21--25",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

TY - JOUR

T1 - Compressing LSTM networks with hierarchical coarse-grain sparsity

AU - Kadetotad, Deepak

AU - Meng, Jian

AU - Berisha, Visar

AU - Chakrabarti, Chaitali

AU - Seo, Jae Sun

N1 - Funding Information: This work was in part supported by NSF grant 1652866, Samsung, ONR, and C-BRIC, one of six centers in JUMP, a SRC program sponsored by DARPA. Funding Information: This work was in part supported by NSF grant 1652866, Sam-sung, ONR, and C-BRIC, one of six centers in JUMP, a SRC program sponsored by DARPA. Publisher Copyright: Copyright © 2020 ISCA

PY - 2020

Y1 - 2020

N2 - The long short-term memory (LSTM) network is one of the most widely used recurrent neural networks (RNNs) for automatic speech recognition (ASR), but is parametrized by millions of parameters. This makes it prohibitive for memory-constrained hardware accelerators as the storage demand causes higher dependence on off-chip memory, which bottlenecks latency and power. In this paper, we propose a new LSTM training technique based on hierarchical coarse-grain sparsity (HCGS), which enforces hierarchical structured sparsity by randomly dropping static block-wise connections between layers. HCGS maintains the same hierarchical structured sparsity throughout training and inference; this reduces weight storage for both training and inference hardware systems. We also jointly optimize in-training quantization with HCGS on 2-/3-layer LSTM networks for the TIMIT and TED-LIUM corpora. With 16× structured compression and 6-bit weight precision, we achieved a phoneme error rate (PER) of 16.9% for TIMIT and a word error rate (WER) of 18.9% for TED-LIUM, showing the best trade-off between error rate and LSTM memory compression compared to prior works.

AB - The long short-term memory (LSTM) network is one of the most widely used recurrent neural networks (RNNs) for automatic speech recognition (ASR), but is parametrized by millions of parameters. This makes it prohibitive for memory-constrained hardware accelerators as the storage demand causes higher dependence on off-chip memory, which bottlenecks latency and power. In this paper, we propose a new LSTM training technique based on hierarchical coarse-grain sparsity (HCGS), which enforces hierarchical structured sparsity by randomly dropping static block-wise connections between layers. HCGS maintains the same hierarchical structured sparsity throughout training and inference; this reduces weight storage for both training and inference hardware systems. We also jointly optimize in-training quantization with HCGS on 2-/3-layer LSTM networks for the TIMIT and TED-LIUM corpora. With 16× structured compression and 6-bit weight precision, we achieved a phoneme error rate (PER) of 16.9% for TIMIT and a word error rate (WER) of 18.9% for TED-LIUM, showing the best trade-off between error rate and LSTM memory compression compared to prior works.

KW - Long short-term memory

KW - Speech recognition

KW - Structured sparsity

KW - Weight compression

UR - http://www.scopus.com/inward/record.url?scp=85098177563&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85098177563&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2020-1270

DO - 10.21437/Interspeech.2020-1270

M3 - Conference article

AN - SCOPUS:85098177563

SN - 2308-457X

VL - 2020-October

SP - 21

EP - 25

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020

Y2 - 25 October 2020 through 29 October 2020

ER -

Compressing LSTM networks with hierarchical coarse-grain sparsity

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this