Abstract
Long short-term memory (LSTM) networks are widely used for speech applications but pose difficulties for efficient implementation on hardware due to large weight storage requirements. We present an energy-efficient LSTM recurrent neural network (RNN) accelerator, featuring an algorithm-hardware co-optimized memory compression technique called hierarchical coarse-grain sparsity (HCGS). Aided by HCGS-based block-wise recursive weight compression, we demonstrate LSTM networks with up to 16× fewer weights while achieving minimal accuracy loss. The prototype chip fabricated in 65-nm LP CMOS achieves 8.93/7.22 TOPS/W for 2-/3-layer LSTM RNNs trained with HCGS for TIMIT/TED-LIUM corpora.
Original language | English (US) |
---|---|
Article number | 8877949 |
Pages (from-to) | 119-122 |
Number of pages | 4 |
Journal | IEEE Solid-State Circuits Letters |
Volume | 2 |
Issue number | 9 |
DOIs | |
State | Published - Sep 2019 |
Keywords
- Hardware accelerator
- Long short-term memory (LSTM)
- Speech recognition
- Structured sparsity weight compression
ASJC Scopus subject areas
- Electrical and Electronic Engineering