Comparing data augmentation and annotation standardization to improve end-to-end spoken language understanding models

Leah Nicolich-Henkin; Taichi Nakatani; Zach Trozenski; Joel Whiteman; Nathan Susanj

Publication

Comparing data augmentation and annotation standardization to improve end-to-end spoken language understanding models

By Leah Nicolich-Henkin, Taichi Nakatani, Zach Trozenski, Joel Whiteman, Nathan Susanj

2021

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

All-neural end-to-end (E2E) Spoken Language Understanding (SLU) models can improve performance over traditional compositional SLU models, but have the challenge of requiring high-quality training data with both audio and annotations. In particular they struggle with performance on “golden utterances”, which are essential for defining and supporting features, but may lack sufficient training data. In this paper, we compare two data-centric AI methods for improving performance on golden utterances: improving the annotation quality of existing training utterances and augmenting the training data with varying amounts of synthetic data. Our experimental results show improvements with both methods, and in particular that augmenting with synthetic data is effective in addressing errors caused by both inconsistent training data annotations as well as lack of training data. This method leads to improvement in intent recognition error rate (IRER) on our golden utterance test set by 93% relative to the baseline without seeing a negative impact on other test metrics.

Comparing data augmentation and annotation standardization to improve end-to-end spoken language understanding models

Latest news

Work with us