Improving cascade routing for structured attribute generation with heterogeneous confidence

Samira Mansoori; Andrea Scarinci; Aditya Aggarwal; Suleiman Khan; Ashwin Chandramouli

Publication

Improving cascade routing for structured attribute generation with heterogeneous confidence

By Samira Mansoori, Andrea Scarinci, Aditya Aggarwal, Suleiman Khan, Ashwin Chandramouli

2025

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Multi-model inference systems—whether based on routing, cascading, or unified strategies—often rely on confidence signals to decide when a small language model (SLM) output should be accepted or deferred. While such signals are commonly used in classification and short-form generation, their reliability in structured generation settings remains poorly understood. In this work, we study log-probability confidence in structured attribute value generation, where a model must produce either a schema-compliant VALUE or an ABSTAIN outcome. We show that confidence is prediction-type-conditioned: in our setting, average token log-probability is a stronger error-detection signal for VALUE outputs than for ABSTAIN outputs. As a result, global confidence thresholding yields imbalanced trade-offs, improving VALUE precision at the cost of recall while providing weaker control over abstention behavior. We therefore cast cascade routing as type-aware selective deferral, in which acceptance decisions depend on both the confidence score and the predicted output type, with VALUE thresholds specialized by attribute family. Experiments on a large-scale product attribute generation task show that a fine-tuned SLM combined with selective deferral improves quality–cost trade-offs relative to pooled thresholding. The strongest operating point routes low-confidence VALUE predictions while keeping ABSTAIN predictions from the first-stage model, highlighting the importance of modeling heterogeneous reliability in structured-generation cascades.

Improving cascade routing for structured attribute generation with heterogeneous confidence

Latest news

Work with us