Learning under label noise for robust spoken language understanding systems

Anoop Kumar; Pankaj Sharma; Aravind Illa; Sriram Venkatapathy; Nandi Subhrangshu; Pritam Varma; Anurag Dwarakanath; Aram Galstyan

Publication

Learning under label noise for robust spoken language understanding systems

By Anoop Kumar, Pankaj Sharma, Aravind Illa, Sriram Venkatapathy, Nandi Subhrangshu, Pritam Varma, Anurag Dwarakanath, Aram Galstyan

2022

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Most real-world datasets contain inherent label noise which leads to memorization and overfitting when such data is used to train over-parameterized deep neural networks. While memorization in DNNs has been studied extensively in computer vision literature, the impact of noisy labels and various mitigation strategies in Spoken Language Understanding tasks is largely under-explored. In this paper, we perform a systematic study on the effectiveness of five noise mitigation methods in Spoken Language text classification tasks. First, we experiment on three publicly available datasets by synthetically injecting noise into the labels and evaluate the effectiveness of various methods at different levels of noise intensity. We then evaluate these methods on a real-world data coming from a large-scale industrial Spoken Language Understanding system. Our results show that most methods are effective in mitigating the impact of noise with two of the methods showing consistently better results. For the industrial Spoken Language Understanding system, the best performing method is able to recover 65% to 97% of the loss in performance due to noise.

Learning under label noise for robust spoken language understanding systems

Latest news

Work with us