Multi-task learning of spoken language understanding by integrating n-best hypotheses with hierarchical attention
Currently, in spoken language understanding (SLU) systems, the automatic speech recognition (ASR) module produces multiple interpretations (or hypotheses) for the input audio signal and the natural language understanding (NLU) module takes the one with the highest confidence score for domain or intent classification. However, the interpretations can be noisy, and solely relying on one interpretation can cause information loss. To address the problem, many research works attempt to rerank the interpretations for a better choice while some recent works get better performance by integrating all the hypotheses during prediction. In this paper, we follow the way of integrating hypotheses but strengthen the training mode by involving more tasks, some of which may be not in existing tasks of NLU but relevant, via multi-task learning or transfer learning. Moreover, we propose the Hierarchical Attention Mechanism (HAM) to further improve the performance with the acoustic-model features like confidence scores, which are ignored in the current hypotheses integration models. The experimental results show that compared to the standard estimation with one hypothesis, the multi-task learning with HAM can improve the domain and intent classification by relatively 19% and 37%, which are much higher than improvements with current integration or reranking methods. To illustrate the cause of improvements brought by our model, we decode the hidden representations of some utterance examples and compare the generated texts with hypotheses and transcripts. The comparison shows that our model could recover the transcription by integrating the fragmented information among hypotheses and identifying the frequent error patterns of the ASR module, and even rewrite the query for a better understanding, which reveals the characteristic of multi-task learning of broadcasting knowledge.