Building a robust word-level wakeword verification network

Rajath Kumar; Mike Rodehorst; Joe Wang; Jiacheng Gu; Brian Kulis

Publication

Building a robust word-level wakeword verification network

By Rajath Kumar, Mike Rodehorst, Joe Wang, Jiacheng Gu, Brian Kulis

2020

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Wakeword detection is responsible for switching on downstream systems in a voice-activated device. To prevent a response when the wakeword is detected by mistake, a secondary network is often utilized to verify the detected wakeword. Published verification approaches are formulated based on Automatic Speech Recognition (ASR) biased towards the wakeword. This approach has several drawbacks, including high model complexity and the necessity of large vocabulary training data. To address these shortcomings, we propose to use a large receptive field (LRF) word-level wakeword model, and in particular, a convolutional-recurrent-attention (CRA) network. CRA networks use a strided small receptive field convolutional front-end followed by fixed time-step recurrent layers optimized to model the temporal phonetic dependencies within the wakeword. We experimentally show that this type of modeling helps the system to be robust to errors in the location of the wakeword as estimated by the detection network. The proposed CRA network significantly outperforms previous baselines, including an LRF whole-word convolutional network and a 2-stage DNN-HMM system. Additionally, we study the importance of pre- and postwakeword context. Finally, the CRA network has significantly fewer model parameters and multiplies, which makes it suitable for real-world production applications.

Building a robust word-level wakeword verification network

Latest news

Work with us