Invariant representation learning for robust deep networks
Deep neural networks are often brittle to superficial perturbations of their inputs; models that perform well offline on held-out data can still break under small amounts of naturally-occurring or adversarial shifts. We consider invariant representation learning (IRL), first proposed in the domain of speech recognition, as a simple, effective, and general extension to data augmentation. Rather than only presenting original and noisy inputs as having the same label, IRL also promotes similar intermediate representations for original examples and their noised counterparts. The approach penalizes the distance (typically L2 and cosine distances) between their activations, at every layer above a chosen bottleneck. We motivate IRL from vicinal risk motivation and existing regularizers, formulate IRL for image classification, language modeling, speech recognition, and semi-supervised learning, and experimentally show improvements on these tasks in terms of accuracy and in robustness to synthetic, out-of-domain, and adversarial noise.