ADVIN: Automatically discovering novel domains and intents from user text utterances
Recognizing the intents and domains of users’ spoken and written language is a key component of Natural Language Understanding (NLU) systems. Real applications however encounter dynamic, rapidly evolving environments with newly emerging intents and domains, for which no labeled data or prior information is available. For such a setting, we propose a novel framework, ADVIN, to automatically discover novel domains and intents from large volumes of unlabeled text. We first employ an open classification model to discriminate all utterances potentially consisting of a novel intent. Next, we train a deep learning model with a pairwise margin loss function and knowledge transfer, to discover multiple latent intent categories in an unsupervised manner. We finally form a hierarchical intent-domain taxonomy by linking mutually related novel intents into novel domains. ADVIN significantly outperforms strong baselines on four benchmark datasets, and data from a real-world voice agent.