Domain adaptation with external off-policy acoustic catalogs for scalable contextual end-to-end automated speech recognition

David M. Chan; Shalini Ghosh; Ariya Rastrow; Björn Hoffmeister

Publication

Domain adaptation with external off-policy acoustic catalogs for scalable contextual end-to-end automated speech recognition

By David M. Chan, Shalini Ghosh, Ariya Rastrow, Björn Hoffmeister

2023

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Despite improvements to the generalization performance of automated speech recognition (ASR) models, specializing ASR models for downstream tasks remains a challenging task, primarily due to reduced data availability (necessitating increased data collection), and rapidly shifting data distributions (requiring more frequent model fine-tuning). In this work, we investigate the potential of leveraging external knowledge, particularly through off-policy generated text-to-speech keyvalue stores, to allow for flexible post-training adaptation to new data distributions. In our approach, audio embeddings captured from text-to-speech are used, along with semantic text embeddings, to bias ASR via an approximate k-nearest-neighbor (KNN) based attentive fusion step. Our experiments on LibiriSpeech and in-house voice assistant/search datasets show that the proposed approach can reduce domain adaptation time by up to 1K GPU-hours while providing up to 3% WER improvement compared to a fine-tuning baseline, suggesting a promising approach for adapting production ASR systems in challenging zero and few-shot scenarios.

Domain adaptation with external off-policy acoustic catalogs for scalable contextual end-to-end automated speech recognition

Latest news

Work with us