Knowledge distillation via module replacing for automatic speech recognition with recurrent neural network transducer

Kaiqi Zhao; Hieu Duy Nguyen; Animesh Jain; Nathan Susanj; Thanasis Mouchtaris; Lokesh Gupta; Ming Zhao

Publication

Knowledge distillation via module replacing for automatic speech recognition with recurrent neural network transducer

By Kaiqi Zhao, Hieu Duy Nguyen, Animesh Jain, Nathan Susanj, Thanasis Mouchtaris, Lokesh Gupta, Ming Zhao

2022

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Automatic Speech Recognition (ASR) is increasingly used by edge applications such as intelligent virtual assistants. However, state-of-the-art ASR models such as Recurrent Neural Network - Transducer (RNN-T) are computationally intensive on resource-constrained edge devices. Knowledge Distillation (KD) is a promising approach to compress large models by using a large model (”teacher”) to train a small model (”student”). This paper proposes a novel KD method called Log-Curriculum based Module Replacing (LCMR) for RNN-T. LCMR compresses RNN-T and addresses its unique characteristics by replacing teacher modules including multiple LSTM/Dense layers with substitutional student modules that contain less Long Short Term Memory (LSTM)/Dense layers. LCMR employs a novel nonlinear Curriculum Learning driven replacement strategy to further improve the performance by updating replacing rates with a dynamic, smoothing mechanism. Under LCMR, the student and teacher are able to interact at gradient level, and transfer knowledge more effectively than conventional KD. Evaluation shows that LCMR reduces word-error-rate (WER) by 14.47%-33.24% relative compared to conventional KD.

Knowledge distillation via module replacing for automatic speech recognition with recurrent neural network transducer

Latest news

Work with us