Abstract
Automatic Speech Recognition (ASR) is increasingly used by edge applications such as intelligent virtual assistants. However, state-of-the-art ASR models such as Recurrent Neural Network - Transducer (RNN-T) are computationally intensive on resource-constrained edge devices. Knowledge Distillation (KD) is a promising approach to compress large models by using a large model (”teacher”) to train a small model (”student”). This paper proposes a novel KD method called Log-Curriculum based Module Replacing (LCMR) for RNN-T. LCMR compresses RNN-T and addresses its unique characteristics by replacing teacher modules including multiple LSTM/Dense layers with substitutional student modules that contain less Long Short Term Memory (LSTM)/Dense layers. LCMR employs a novel nonlinear Curriculum Learning driven replacement strategy to further improve the performance by updating replacing rates with a dynamic, smoothing mechanism. Under LCMR, the student and teacher are able to interact at gradient level, and tranfser knowledge more effectively than conventional KD. Evaluation shows that LCMR reduces word-error-rate (WER) by 14.47%-33.24% relative compared to conventional KD.
Original language | English (US) |
---|---|
Pages (from-to) | 4436-4440 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Volume | 2022-September |
DOIs | |
State | Published - 2022 |
Event | 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of Duration: Sep 18 2022 → Sep 22 2022 |
Keywords
- Knowledge Distillation
- Model Compression
- Module Replacing
- RNN-T
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modeling and Simulation