Exploiting large-scale teacher-student training for on-device acoustic models

Jing Liu; Rupak Vignesh Swaminathan; Sree Hari Krishnan Parthasarathi; Chunchuan Lyu; Thanasis Mouchtaris; Siegfried Kunzmann

Publication

Exploiting large-scale teacher-student training for on-device acoustic models

By Jing Liu, Rupak Vignesh Swaminathan, Sree Hari Krishnan Parthasarathi, Chunchuan Lyu, Thanasis Mouchtaris, Siegfried Kunzmann

2021

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

We present results from Alexa speech teams on semi-supervised learning (SSL) of acoustic models (AM) with experiments spanning over 3000 hours of GPU time, making our study one of the largest of its kind. We discuss SSL for AMs in a small footprint setting, showing that a smaller capacity model trained with 1 million hours of unsupervised data can outperform a baseline supervised system by 14.3% word error rate reduction (WERR). When increasing the supervised data to seven-fold, our gains diminish to 7.1% WERR; to improve SSL efficiency at larger supervised data regimes, we employ a step-wise distillation into a smaller model, obtaining a WERR of 14.4%. We then switch to SSL using larger student models in low data regimes; while learning efficiency with unsupervised data is higher, student models may outperform teacher models in such a setting. We develop a theoretical sketch to explain this behavior.

Exploiting large-scale teacher-student training for on-device acoustic models

Latest news

Work with us