But are you sure? An uncertainty-aware perspective on explainable AI
2022
Even when a black-box model makes accurate predictions (e.g., whether it will rain tomorrow), it is difficult to extract principles from the model that improve human understanding (e.g., what set of atmospheric conditions best predict rainfall). The field of model explainability approaches this problem by identifying salient aspects of the model, such as data features to which the model is most sensitive. However, these methods can be unstable and inconsistent, leading to unreliable insights. Specifically, when there are many near-optimal models, there is no guarantee that a single explanation for a best-fitted model will agree with “true explanation”: the explanation from the (unknown) true model that generated the data. In this work, we aim to construct an uncertainty set that is guaranteed to include the true explanation with high probability. We develop methods to compute such a set in both frequentist and Bayesian settings. Through synthetic experiments, we demonstrate that our uncertainty sets have high fidelity to the explanations of the true model. Real-world experiments confirm the effectiveness of our approach.
Research areas