Adversarial named-entity recognition with word attributions and disentanglement
2023
The issue of enhancing the robustness of Named Entity Recognition (NER) models against adversarial attacks has recently gained significant attention (Simoncini and Spanakis, 2021; Lin et al., 2021). The existing techniques for robustifying NER models rely on exhaustive perturbation of the input training data to generate adversarial examples, often resulting in adversarial examples that are not semantically equivalent to the original. In this paper, we employ word attributions guided perturbations that generate adversarial examples with a comparable attack rates but at a lower modification rate. Our approach also uses disentanglement of entity and non-entity word representations as a mechanism to generate diverse and unbiased adversarial examples. Adversarial training results based on our method improves the F1 score over originally trained NER model by 8% and 18% on CoNLL-2003 and Ontonotes 5.0 datasets respectively.
Research areas