Continuous model improvement for language understanding with machine translation

Abdalghani Abujabal; Claudio Delli Bovi; Sungho Ryu; Turan Gojayev; Yannick Versley; Fabian Triefenbach

Publication

Continuous model improvement for language understanding with machine translation

By Abdalghani Abujabal, Claudio Delli Bovi, Sungho Ryu, Turan Gojayev, Yannick Versley, Fabian Triefenbach

2021

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Scaling conversational personal assistants to a multitude of languages puts high demands on collecting and labelling data, a setting in which cross-lingual learning techniques can help to reconcile the need for well-performing natural language understanding (NLU) with a desideratum to support many languages without incurring unacceptable cost. In this paper, we show that automatically annotating unlabeled utterances using machine translation in an offline fashion and adding them to the training data can improve performance for existing NLU features for low-resource languages, where a straightforward translate-test approach as considered in existing literature would fail the latency requirements of a live environment. We demonstrate the effectiveness of our method with intrinsic and extrinsic evaluation using a real-world commercial dialog system in German. We show that 56% of the resulting automatically labeled utterances had a perfect match with ground-truth labels. Moreover, we see significant performance improvements in an extrinsic evaluation settings when manually labeled data is available in small quantities.

Continuous model improvement for language understanding with machine translation

Latest news

Work with us