Towards Data-Efficient Modeling for Wake Word Spotting

Yixin Gao; Yuriy Mishchenko; Anish Shah; Spyros Matsoukas; Shiv Vitaladevuni

Publication

Towards Data-Efficient Modeling for Wake Word Spotting

By Yixin Gao, Yuriy Mishchenko, Anish Shah, Spyros Matsoukas, Shiv Vitaladevuni

2020

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Wake word (WW) spotting is challenging in far-field because of not only the interference in signal transmission but also the complexity in acoustic environment. Traditional WW model training requires a large amount of in-domain WW-specific data with substantial human annotations. This prevents the model building in the situation of lacking such data. In this paper we present data-efficient solutions to address the challenges in WW modeling, such as domain-mismatch, noisy conditions, limited annotation, etc. Our proposed system is composed of a multi-condition training pipeline with stratified data augmentation, which improves the model robustness to a variety of predefined acoustic conditions, together with a semi-supervised learning pipeline to extract the WW and adversarial examples from an untranscribed speech corpus. Starting from only 10 hours of domain-mismatched WW audio, we are able to enlarge and enrich the training dataset by 20-100 times to capture the complexity in acoustic environments. Our experiments on real user data show that the proposed solutions can achieve comparable performance of a production-grade model by saving 97% of the amount of WW-specific data to collect and 86% of the bandwidth for annotation.

Towards Data-Efficient Modeling for Wake Word Spotting

Latest news

Work with us