Large-scale hybrid approach for predicting user satisfaction with conversational agents
Measuring user satisfaction level is a challenging task, and it is a critical component in developing large-scale conversational agent systems serving real users. A widely used approach to tackle this is to collect human annotation data and use them for evaluation or modeling. Human annotation based approaches are easier to control, but they are hard to scale. A novel alternative approach is to collect user’s direct feedback via a feedback elicitation system embedded to the conversational agent system and to use the collected user feedback to train a machine-learned model for generalization. User feedback is the best proxy for user satisfaction, but it is not available for some ineligible intents and certain situations. Also, asking too much feedback can hurt user experience. Thus, these two types of approaches are complementary to each other. In this work, we tackle a user satisfaction assessment problem with a hybrid approach that fuses explicit user feedback and user satisfaction predictions inferred by two machine-learned models, one trained on user feedback data and the other human annotation data. With this approach, both human annotators and users are involved in the model development process loop as critical roles. The hybrid approach is based on a waterfall 1 policy, and the experimental results with Amazon Alexa’s large-scale data sets show significant improvements in inferring user satisfaction. A detailed hybrid architecture, an in-depth analysis on user feedback data, and an algorithm that generates data sets to properly simulate the live traffic are presented in this paper.