Beyond accurate answers: Evaluating open-domain question answering in enterprise search
2023
Open-domain question answering (OpenQA) research has grown rapidly in recent years. However, OpenQA usability evaluation in its real world applications is largely left under studied. In this paper, we evaluated the actual user experience of OpenQA model deployed in a large tech company’s production enterprise search portal. From qualitative query log analysis and user interviews, our preliminary findings are: 1) There exists a large number of “contingency answers” that cannot be simply evaluated against their face textual values, due to noisy source passages and ambiguous query intents from short keywords queries. 2) Contingency answers contribute to positive search experience for providing “information scents”. 3) Click-through-rate (CTR) is a good user-behavior metric to measure OpenQA result quality, despite the rare existence of “good abandonment”. This exploratory study reveals an often neglected gap between existing OpenQA research and its search engine applications that disconnects the offline research effort with online user experience. We call for reformulating OpenQA model objective beyond answer face value and developing new dataset and metrics for better evaluation protocols.