Ambiguity detection and uncertainty calibration for question answering with large language models

Zhengyan Shi; Giuseppe Castellucci; Simone Filice; Saar Kuzi; Eugene Agichtein; Oleg Rokhlenko; Shervin Malmasi

Publication

Ambiguity detection and uncertainty calibration for question answering with large language models

By Zhengyan Shi, Giuseppe Castellucci, Simone Filice, Saar Kuzi, Eugene Agichtein, Oleg Rokhlenko, Shervin Malmasi

2025

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Large Language Models (LLMs) have demonstrated excellent capabilities in Question Answering (QA) tasks, yet their ability to identify and address ambiguous questions remains underdeveloped. Ambiguities in user queries often lead to inaccurate or misleading answers, undermining user trust in these systems. Despite prior attempts using prompt-based methods, performance has largely been equivalent to random guessing, leaving a significant gap in effective ambiguity detection. To address this, we propose a novel framework for detecting ambiguous questions within LLM-based QA systems. We first prompt an LLM to generate multiple answers to a question, and then analyze them to infer the ambiguity. We propose to use a lightweight Random Forest model, trained on a bootstrapped and shuffled 6-shot examples dataset. Experimental results on ASQA, PACIFIC, and ABG-COQA datasets demonstrate the effectiveness of our approach, with accuracy up to 70.8%. Furthermore, our framework enhances the confidence calibration of LLM outputs, leading to more trustworthy QA systems that are able to handle complex questions.

Ambiguity detection and uncertainty calibration for question answering with large language models

Latest news

Work with us