AMuSE: Attentive multilingual speech encoding for zero-prior ASR

Ashutosh Varshney; Debmalya Chakrabarty; Akshat Jaiswal; Harish Arsikere; Abhinav Jain; Swayambhu Nath Ray; Frederick Weber; Anand Mohan; PRANTIK SEN; Garima Lalwani; Sambuddha Bhattacharya; Sri Garimella

Publication

AMuSE: Attentive multilingual speech encoding for zero-prior ASR

By Ashutosh Varshney, Debmalya Chakrabarty, Akshat Jaiswal, Harish Arsikere, Abhinav Jain, Swayambhu Nath Ray, Frederick Weber, Anand Mohan, PRANTIK SEN, Garima Lalwani, Sambuddha Bhattacharya, Sri Garimella

2024

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Multilingual ASR offers training, deployment and overall performance benefits, but models trained via simple data pooling are known to suffer from cross-lingual interference. Oracle language information (exact-prior) and language-specific parameters are usually leveraged to overcome this, but such approaches cannot enable seamless, truly multilingual experiences. Existing methods try to overcome this limitation by relying on inferred language information or language agnostic mixture-of-experts, but they incur additional runtime complexity and/or training cost in addition to being less effective in streaming scenarios. Building on previous studies where models were trained to handle mixed-prior (knowledge that the underlying language belongs to a known group), we propose Attentive Multilingual Speech Encoding (AMuSE), a training framework designed to match exact-prior performance even in the absence of underlying language information at runtime (zero-prior), thereby making the model prior-agnostic. Leveraging AMuSE, we build a zero-prior enabled LLM-based ASR system that outperforms several exact-prior driven state-of-the-art benchmarks.

AMuSE: Attentive multilingual speech encoding for zero-prior ASR

Latest news

Work with us