Long short-term memory (LSTM) networks are widely used for speech applications but pose difficulties for efficient implementation on hardware due to large weight storage requirements. We present an energy-efficient LSTM recurrent neural network (RNN) accelerator, featuring an algorithm-hardware co-optimized memory compression technique called hierarchical coarse-grain sparsity (HCGS). Aided by HCGS-based block-wise recursive weight compression, we demonstrate LSTM networks with up to 16× fewer weights while achieving minimal accuracy loss. The prototype chip fabricated in 65-nm LP CMOS achieves 8.93/7.22 TOPS/W for 2-/3-layer LSTM RNNs trained with HCGS for TIMIT/TED-LIUM corpora.

Original languageEnglish (US)
Article number8877949
Pages (from-to)119-122
Number of pages4
JournalIEEE Solid-State Circuits Letters
Issue number9
StatePublished - Sep 2019


  • Hardware accelerator
  • Long short-term memory (LSTM)
  • Speech recognition
  • Structured sparsity weight compression

ASJC Scopus subject areas

  • Electrical and Electronic Engineering


Dive into the research topics of 'A 8.93-tops/w lstm recurrent neural network accelerator featuring hierarchical coarse-grain sparsity with all parameters stored on-chip'. Together they form a unique fingerprint.

Cite this