UseClean: Learning from complex noisy labels in named entity recognition

Jinjin Tian; Kun Zhou; Meiguo Wang; Yu Zhang; Benjamin Yao; Xiaohu Liu; Chenlei (Edward) Guo

Publication

UseClean: Learning from complex noisy labels in named entity recognition

By Jinjin Tian, Kun Zhou, Meiguo Wang, Yu Zhang, Benjamin Yao, Xiaohu Liu, Chenlei (Edward) Guo

2023

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

We investigate and refine denoising methods for NER task on data that potentially contains extremely noisy labels from multi-sources. In this paper, we first summarized all possible noise types and noise generation schemes, based on which we built a thorough evaluation system. We then pinpoint the bottleneck of current state-of-art denoising methods using our evaluation system. Correspondingly, we propose several refinements, including using a twostage framework to avoid error accumulation; a novel confidence score utilizing minimal clean supervision to increase predictive power; an automatic cutoff fitting to save extensive hyperparameter tuning; a warm started weighted partial CRF to better learn on the noisy tokens. Additionally, we propose to use adaptive sampling to further boost the performance in long-tailed entity settings. Our method improves F1 score by on average at least 5 ∼ 10% over current state-of-art across extensive experiments.

UseClean: Learning from complex noisy labels in named entity recognition

Latest news

Work with us