CML: A contrastive meta learning method to estimate human label confidence scores and reduce data collection cost

Bo Dong; Yiyi Wang; Hanbo Sun; Yunji Wang; Alireza Hashemi; Zheng Du

Publication

CML: A contrastive meta learning method to estimate human label confidence scores and reduce data collection cost

By Bo Dong, Yiyi Wang, Hanbo Sun, Yunji Wang, Alireza Hashemi, Zheng Du

2022

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Deep neural network models are especially susceptible to noise in annotated labels. In the real world, annotated data typically contains noise caused by a variety of factors such as task difficulty, annotator experience, and annotator bias. Label quality is critical for label validation tasks; however, correcting for noise by collecting more data is often costly. In this paper, we propose a contrastive meta-learning framework (CML) to address the challenges introduced by noisy annotated data, specifically in the context of natural language processing. CML combines contrastive and meta learning to improve the quality of text feature representations. Meta-learning is also used to generate confidence scores to assess label quality. We demonstrate that a model built on CML-filtered data outperforms a model built on clean data. Furthermore, we perform experiments on deidentified commercial voice assistant datasets and demonstrate that our model outperforms several SOTA approaches.

CML: A contrastive meta learning method to estimate human label confidence scores and reduce data collection cost

Latest news

Work with us