Improving explainable AI’s explanations
Causal analysis improves both the classification accuracy and the relevance of the concepts identified by popular concept-based explanatory models.
Explainability is an important research topic in AI today. If we’re going to trust deep-learning systems to make decisions for us, we often want to know why they make the decisions they do.
One popular approach to explainable AI is concept-based explanation. Instead of simply learning to predict labels from input features, the model learns to assign values to a large array of concepts. For instance, if the inputs are images of birds, the concepts might be things like bill shape, breast color, and wing pattern. Then, on the basis of the concept values, the model classifies the input: say, a yellow grosbeak.
More ICLR-related content
Today, as part of our ICLR coverage, Amazon Science also features a profile of Michael Bronstein, a professor of computer science at Imperial College London who received an Amazon Research Award for work that pushes the boundaries of drug design, reveals the cancer-fighting properties of food — and even decodes whale-speak.
But this approach can run into trouble if there are confounders in the training data. For instance, if birds with spatulate bills are consistently photographed on the water, the model could learn to associate water imagery with the concept “bill shape: spatulate”. And that could produce nonsensical results in the case of, say, a starling that happened to be photographed near a lake.
In a paper that Amazon distinguished scientist David Heckerman and I are presenting this week at the International Conference on Learning Representations (ICLR), we adapt a technique for removing confounders from causal models, called instrumental-variable analysis, to the problem of concept-based explanation.
In tests on a benchmark dataset of images annotated with concept labels, we show that our method increases the classification accuracy of a concept-based explanatory model by an average of 25%. Using the remove-and-retrain (ROAR) methodology, we also show that our method improves the model’s ability to identify concepts relevant to the correct image label.
Our analysis begins with a causal graph, which encodes our prior belief about the causal relationships among the variables. In our case, the belief is that a prediction target (y) causes a concept representation (c), which in turn causes an input (x). Note that prediction happens in the opposite direction, but this doesn’t matter as the statistical relationships between data and concept and concept and label remain the same.
Confounders complicate this simple model. In the figure below, u is a confounder, which influences both the input and the concept (c) learned by the model; d is the debiased concept we wish to learn.
In the terms of our example, u represents the watery backgrounds common to images of birds with spatulate bills, c is the confounded concept of bill shape, and d is a debiased concept of bill shape, which correlates with actual visual features of birds’ bills.
Note, too, that there is a second causal path between input and label, which bypasses concept representation. The experts who label images of birds, for instance, may rely on image features not captured by the list of concepts.
Our approach uses a trick from classic instrumental-variable analysis, which considers the case in which a variable p has a causal effect on the variable q, but that effect is obscured by a confounding variable, u, which influences both p and q. The analysis posits an instrumental variable, z, which is correlated with p but not with q. Instrumental-variable analysis uses regression to estimate p from z; since z is independent of the confounder u, so is the estimate of p, known as p̂. A regression of q on p̂ is thus an estimate of the causal impact of p on q.
In our causal graph above, we can use regression to estimate d from y and c from d, breaking the causal link between u and the estimate of c, ĉ. (In practice, we just set the estimate of c equal to the estimate of d.)
Using benchmark dataset that contains 11,788 images of 200 types of birds, annotated according to 312 concepts, we trained two concept-based explanatory models, which were identical except that one used regression to estimate concepts and one didn’t. The model that used regression was 25% more accurate than the one that didn’t.
The accuracy of the classifier, however, doesn’t tell us anything about the accuracy of the concept identification, which is the other purpose of the model. To evaluate that, we used the ROAR method. First, we train both models using all 312 concepts for each training example. Then we discard the least relevant 31 concepts (10%) for each training example and re-train the models. Then we discard the next least relevant 31 concepts per example and re-train, and so on.
We find that, as irrelevant concepts are discarded, our debiased model exhibits greater relative improvement in accuracy than the baseline model. This indicates that our model is doing a better job than baseline of identifying relevant concepts.