Semi-supervised text classification by gradually updating layers
Most recent neural semi-supervised learning algorithms rely on adding small perturbations to either the input vectors or their representations. These methods have been successful on computer vision tasks, as the images form a continuous manifold, but they are not appropriate for discrete inputs such as sentences. To adapt these methods to text input, we propose to decompose a neural network M into two components, F and U, so that M=U◦F. The layers in F are then frozen, and only the layers in U will be updated during most time of the training. In this way, F serves as a feature extractor that maps the input to a high-level representation and adds systematic noise using dropout. We can then train U using any state-of-the-art SSL algorithms, such as Π-model, temporal ensembling, mean teacher, etc. Furthermore, this gradually unfreezing schedule also prevents a pre-trained model from catastrophic forgetting. The experimental results demonstrate that our approach provides improvements when compared to state-of-the-art methods, especially on short texts.