Towards NLU model robustness to ASR errors at scale
In a large-scale Spoken Language Understanding system, Natural Language Understanding (NLU) models are typically decoupled, i.e, trained and updated independently, from the upstream Automatic Speech Recognition (ASR) system that provides textual hypotheses for the user’s voice signal as input to NLU. Such ASR hypotheses often contain errors causing severe performance degradation as the downstream NLU models are trained on clean human-annotated transcripts. Furthermore, as the ASR model updates, the error distribution drifts making it even harder for NLU models to recover and making manual annotation of erroneous ASR hypotheses impractical. In this paper, we investigate data-efficient techniques applicable to a wide variety of NLU models employed in large-scale production environments to make them robust to ASR errors. We measure the effectiveness of such techniques as both the ASR error distribution and usage patterns change over time.