Data-efficient paraphrase generation to bootstrap intent classification and slot labeling for new features in task-oriented dialog systems

Shailza Jolly; Tobias Falke; Caglar Tirkaz; Daniil Sorokin

Publication

Data-efficient paraphrase generation to bootstrap intent classification and slot labeling for new features in task-oriented dialog systems

By Shailza Jolly, Tobias Falke, Caglar Tirkaz, Daniil Sorokin

2020

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Recent progress through advanced neural models pushed the performance of task-oriented dialog systems to almost perfect accuracy on existing benchmark datasets for intent classification and slot labeling. However, in evolving real-world dialog systems, where new functionality is regularly added, a major additional challenge is the lack of annotated training data for such new functionality, as the necessary data collection efforts are laborious and time-consuming. A potential solution to reduce the effort is to augment initial seed data by paraphrasing existing utterances automatically. In this paper, we propose a new, data-efficient approach following this idea. Using an interpretation-to-text model for paraphrase generation, we are able to rely on existing dialog system training data, and, in combination with shuffling-based sampling techniques, we can obtain diverse and novel paraphrases from small amounts of seed data. In experiments on a public dataset and with a real-world dialog system, we observe improvements for both intent classification and slot labeling, demonstrating the usefulness of our approach.

Data-efficient paraphrase generation to bootstrap intent classification and slot labeling for new features in task-oriented dialog systems

Latest news

Work with us