TINYS2I: A small-footprint utterance classification model with contextual support for on-device SLU
On-device spoken language understanding (SLU) offers the potential for significant latency savings compared to cloud-based processing, as the audio stream does not need to be transmitted to a server. We present Tiny Signal-to-interpretation (TinyS2I), an end-to-end on-device SLU approach which is focused on heavily resource constrained devices. TinyS2I brings latency reduction without accuracy degradation, by exploiting use cases when the distribution of utterances that users speak to a device is largely heavy-tailed. The model is tailored to process on-device frequent utterances with support for dynamic contextual content, while deferring all other requests to the cloud. Compared to a powerful baseline, we demonstrate that TinyS2I achieves comparable performance, while offering latency gains due to local processing.