Enhance robustness of sequence labelling with masked adversarial training
Adversarial training (AT) has shown strong regularization effects on deep learning algorithms by introducing small input perturbations to improve model robustness. In language tasks, adversarial training brings word-level robustness by adding input noise, which is beneficial for text classification. However, it lacks sufficient contextual information enhancement and thus is less useful for sequence labelling tasks such as chunking and named entity recognition (NER). To address this limitation, we propose masked adversarial training (MAT) to improve robustness from contextual information in sequence labelling. MAT masks or replaces some words in the sentence when computing adversarial loss from perturbed inputs and consequently enhances model robustness using more context-level information. In our experiments, our method shows significant improvements on accuracy and robustness of sequence labelling. By further incorporating with ELMo embeddings, our model achieves better or comparable results to state-of-the-art on CoNLL 2000 and 2003 benchmarks using much less parameters.