This repository contains the files for the blogpost "Detect adversarial inputs using Amazon SageMaker Model Monitor and Amazon SageMaker Debugger".
Create a SageMaker notebook instance and clone the repository: git clone https://github.com/amazon-research/detecting-adversarial-samples-using-sagemaker.git
In the notebook Detecting_adversarial_samples.ipynb we first train an image classification model (ResNet18) on CIFAR10 and then deploy it on Amazon SageMaker.
We will setup a custom SageMaker Model Monitor schedule that periodically kicks off a custom processing job that will run a two-sample statistical test using MMD (maximum mean discrepancy). This test detects adversarial samples.
The image below shows, the TSNE visualizations of feature representations for natural and adversarial samples obtained from different layers in the model (layer 0 presents the model inputs). We can see that adversarial samples become more distinguishable for the deeper layer of the ResNet18 model. The intuition is, that raw inputs are noisy and high-dimensional whereas the latent representation (produced by the deeper layer of a neural network) capture low-dimensional semantic information. We will use SageMaker Debugger in the endpoint to capture these representations during inference.
We will then run inference with test and adversarial images and determine how well they are detected by our custom Model Monitor.