Defuse: Training more robust models through creation and correction of novel model errors
We typically compute aggregate statistics on held-out test data to assess the generalization of machine learning models. However, test data is only so comprehensive, and in practice, important cases are often missed. Thus, the performance of deployed machine learning models can be variable and untrustworthy. Motivated by these concerns, we develop methods to generate and correct novel model errors beyond those available in the data. We propose Defuse: a technique that trains a generative model on a classifier’s training dataset and then uses the latent space to generate new samples that are no longer correctly predicted by the classifier. For instance, given a classifier trained on the MNIST dataset that correctly predicts a test image, Defuse then uses this image to generate new similar images by sampling from the latent space. Defuse then identifies the images that differ from the label of the original test input. Defuse enables efficient labeling of these new images, allowing users to re-train a more robust model, thus improving overall model performance. We evaluate the performance of Defuse on classifiers trained on real world datasets and find it reveals novel sources of model errors.