Language-agnostic and language-aware multilingual natural language understanding for large-scale intelligent voice assistant application
Natural language understanding (NLU) is one of the most critical components in goal-oriented dialog systems and enables innovative Big Data applications such as intelligent voice assistants (IVA) and chatbots. While recent advances in deep learning-based NLU models have achieved significant improvements in terms of accuracy, most existing works are monolingual or bilingual. In this work, we propose and experiment with techniques to develop multilingual NLU models. In particular, we first propose a purely language-agnostic multilingual NLU framework using a multilingual BERT (mBERT) encoder, a joint decoder design for intent classification and slot filling tasks, and a novel co-appearance regularization technique. Then three distinct language-aware multilingual NLU approaches are proposed including using language code as explicit input; using language-specific parameters during decoding; and using implicit language identification as an auxiliary task. We show results for a large-scale, commercial IVA system trained on a various set of intents with huge vocabulary sizes, as well as on a public multilingual NLU dataset. We performed experiments in explicit consideration of code-mixing and language dissimilarities which are practical concerns in large-scale real-world IVA systems. We have found that language-aware designs can improve NLU performance when language dissimilarity and code-mixing exist. The empirical results together with our proposed architectures provide important insights towards designing multilingual NLU systems.