Improving ASR confidence scores for Alexa using acoustic and hypothesis embeddings
2019
In automatic speech recognition, confidence measures provide a quantitative representation used to assess the reliability of generated hypothesis text. For personal assistant devices like Alexa, speech recognition errors are inevitable due to the growing number of applications. Hence, confidence scores provide an important metric to downstream consumers to gauge the correctness of ASR hypothesis text and to subsequently initiate appropriate actions. In this work, our aim is to improve the baseline classifier based confidence model architecture by appending additional acoustic and hypothesis embeddings to the input features. Experimental results suggest that appending acoustic embeddings provides more improvements on insertion tokens as compared to appending hypothesis embeddings which improves more on substitution tokens with respect to a baseline trained on decoder features only. Appending both acoustic as well as hypothesis embeddings provides the best results with 6% relative EER reduction and 13% relative NCE increase for logistic regression classifier.
Research areas