Domain-Specific Utterance End-Point Detection for Speech Recognition

Roland Maas; Ariya Rastrow; Kyle Goehner; Gautam Tiwari; Shaun Joseph; Björn Hoffmeister

Publication

Domain-Specific Utterance End-Point Detection for Speech Recognition

By Roland Maas, Ariya Rastrow, Kyle Goehner, Gautam Tiwari, Shaun Joseph, Björn Hoffmeister

2017

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

The task of automatically detecting the end of a device-directed user request is particularly challenging in case of switching short command and long free-form utterances. While low latency end-pointing configurations typically lead to good user experiences in the case of short requests, such as “play music”, it can be too aggressive in domains with longer free-form queries, where users tend to pause noticeably between words and hence are easily cut off prematurely. We previously proposed an approach for accurate end-pointing by continuously estimating pause duration features over all active recognition hypotheses. In this paper, we study the behavior of these pause duration features and infer domain-dependent parametrizations. We furthermore propose to adapt the end-pointer aggressiveness on-the-fly by comparing the Viterbi scores of active short command vs. long free-form decoding hypotheses. The experimental evaluation evidences a 18% relative reduction in word error rate on free-form requests while maintaining low latency on short queries

Domain-Specific Utterance End-Point Detection for Speech Recognition

Latest news

Work with us