PSVD: Post-training compression of LSTM-based RNN-T models
RNN-T has received a lot of attention recently since it achieves state-of-art WER in automatic speech recognition. To run the RNN-T model in real-time on resource-limited edge devices, model compression is often required. However, typical compression methods are still challenging to apply to RNN-T. First, it takes a lot of fine-tuning time and computing resources (e.g., up to several weeks even with multiple GPUs) due to the large size of training speech dataset and slow training speed of LSTMs. Second, the ASR training pipeline and datasets are often proprietary and not publicly available. Thus it is hard to fine-tune the pre-trained model after compression. In this paper, we propose PSVD (Post-training SVD) which could effectively improve the WER of a SVD compressed RNN-T model without requiring the large training dataset and costly back-propagation. Based on a small amount of test data, PSVD could quickly post-train a SVD compressed RNN-T model by leveraging a light-weight linear regression. In particular, we observed that multiple iterations of layer-sequential linear regression is effective in optimizing the compressed model. Compared to SVD compression without fine-tuning, PSVD could improve WER by up to 8.36% at a fraction of the compute time. To the best of our knowledge, our work is the first to leverage post-training compression for LSTM and RNN-T.