Improving cascade routing for structured attribute generation with heterogeneous confidence
2025
Multi-model inference systems—whether based on routing, cascading, or unified strategies—often rely on confidence signals to decide when a small language model (SLM) output should be accepted or deferred. While such signals are commonly used in classification and short-form generation, their reliability in structured generation settings remains poorly understood.
In this work, we study log-probability confidence in structured attribute value generation, where a model must produce either a schema-compliant VALUE or an ABSTAIN outcome. We show that confidence is prediction-type-conditioned: in our setting, average token log-probability is a stronger error-detection signal for VALUE outputs than for ABSTAIN outputs. As a result, global confidence thresholding yields imbalanced trade-offs, improving VALUE precision at the cost of recall while providing weaker control over abstention behavior.
We therefore cast cascade routing as type-aware selective deferral, in which acceptance decisions depend on both the confidence score and the predicted output type, with VALUE thresholds specialized by attribute family. Experiments on a large-scale product attribute generation task show that a fine-tuned SLM combined with selective deferral improves quality–cost trade-offs relative to pooled thresholding. The strongest operating point routes low-confidence VALUE predictions while keeping ABSTAIN predictions from the first-stage model, highlighting the importance of modeling heterogeneous reliability in structured-generation cascades.
Research areas