Accelerator-aware training for transducer-based speech recognition

Suhaila Shakiah; Rupak Vignesh Swaminathan; Hieu Duy Nguyen; Raviteja Chinta; Tariq Afzal; Nathan Susanj; Thanasis Mouchtaris; Grant Strimel; Ariya Rastrow

Publication

Accelerator-aware training for transducer-based speech recognition

By Suhaila Shakiah, Rupak Vignesh Swaminathan, Hieu Duy Nguyen, Raviteja Chinta, Tariq Afzal, Nathan Susanj, Thanasis Mouchtaris, Grant Strimel, Ariya Rastrow

2022

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Machine learning model weights and activations are represented in full-precision during training. This leads to performance degradation in runtime when deployed on neural network accelerator (NNA) chips, which leverage highly parallelized fixed-point arithmetic to improve runtime memory and latency. In this work, we replicate the NNA operators during the training phase, accounting for the degradation due to low-precision inference on the NNA in back-propagation. Our proposed method efficiently emulates NNA operations, thus foregoing the need to transfer quantization error-prone data to the Central Processing Unit (CPU), ultimately reducing the user perceived latency (UPL). We apply our approach to Recurrent Neural Network-Transducer (RNN-T), an attractive architecture for on-device streaming speech recognition tasks. We train and evaluate models on 270K hours of English data and show a 5-7% improvement in engine latency while saving up to 10% relative degradation in WER.

Accelerator-aware training for transducer-based speech recognition

Latest news

Work with us