Label with confidence: Effective confidence calibration and ensembles in LLM-powered classification

Karen Hovsepian; Dantong Liu; Sugumar Murugesan

Publication

Label with confidence: Effective confidence calibration and ensembles in LLM-powered classification

By Karen Hovsepian, Dantong Liu, Sugumar Murugesan

2024

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Large Language Models (LLMs) have been employed as crowd-sourced annotators to alleviate the burden of human labeling. However, the broader adoption of LLM-based automated labeling systems encounters two main challenges: 1) LLMs are prone to producing unexpected and unreliable predictions, and 2) no single LLM excels at all labeling tasks. To address these challenges, we first develop fast and effective logit-based confidence score calibration pipelines, aiming to leverage calibrated LLM confidence score to accurately estimate the LLM’s level of confidence. We propose novel calibration error based sampling strategy to efficiently select labeled data for calibration, leading to a reduction of calibration error by 46%, compared with uncalibrated scores. Leveraging calibrated confidence scores, we then design a cost-aware cascading LLM ensemble policy which achieves improved accuracy, while reducing inference cost by more than 2 times compared with the conventional weighted majority voting ensemble policy.

Label with confidence: Effective confidence calibration and ensembles in LLM-powered classification

Latest news

Work with us