Multi-domain goal-oriented dialogues (MultiDoGO): Strategies toward curating and annotating large scale dialogue data

Denis Peskov; Jason Krone; Nancy Clarke; Brigi Fodor; Mona Diab; Adel Youssef; Yi Zhang

Publication

Multi-domain goal-oriented dialogues (MultiDoGO): Strategies toward curating and annotating large scale dialogue data

By Denis Peskov, Jason Krone, Nancy Clarke, Brigi Fodor, Mona Diab, Adel Youssef, Yi Zhang

2019

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

The need for high-quality, large-scale, goal-oriented dialogue datasets continues to grow as virtual assistants become increasingly widespread. However, existing publicly available datasets useful for this area are limited either in their size, linguistic diversity, domain coverage, or annotation granularity. We introduce the MultiDoGO dataset to overcome these limitations. With a total of over 65,000 dialogues across seven domains, MultiDoGO is a magnitude larger than MultiWOZ, the current largest comparable dialogue dataset. We employ a Wizard-of-Oz approach wherein a crowd-sourced worker (the “customer”) is paired with a trained annotator (the “agent”). Including a trained participant in each conversation allows us to increase linguistic diversity by avoiding templates, while maintaining conversational coherency. We provide intent class annotations unique to customers and agents, along with applicable slot labels at the conversation turn level. We present our strategies for eliciting and annotating a dialogue dataset in a manner that scales across languages and modalities. We establish strong neural baselines for intent classification and slot labeling tasks on each domain.

Multi-domain goal-oriented dialogues (MultiDoGO): Strategies toward curating and annotating large scale dialogue data

Latest news

Work with us