Improving large-scale conversational assistants using model interpretation based training sample selection
Natural language understanding (NLU) models are a core component of large-scale conversational assistants. Collecting training data for these models through manual annotations is slow and expensive that impedes the pace of model improvement. We present a three stage approach to address this challenge: First, we identify a large set of relatively infrequent utterances from live traffic where the users implicitly communicated satisfaction with a response (such as by not interrupting), along with the existing model outputs as candidate annotations. Second, we identify a small subset of these utterances usings Integrated Gradients based importance scores computed with the current models. Finally, we augment our training sets with these utterances and retrain our models. We demonstrate the effectiveness of our approach in a large-scale conversational assistant, processing billions of utterances every week. By augmenting our training set with just 0.05% more utterances through our approach, we observe statistically significant improvements for infrequent tail utterances: a 0.45% reduction in semantic error rate (SemER) in offline experiments, and a 1.23% reduction in defect rates in online A/B tests.