Prompt-tuning in ASR systems for efficient domain-adaptation
Automatic Speech Recognition (ASR) systems form a key component of various products across industry. Many of these ASR systems rely on a complex Acoustic Model (AM) whose output is rescored by a domain-specific Language Model (LM). As we use ASR systems in new domains, the memory, maintenance and data-collection costs for these domain-specific LMs increase. Particularly, with advent of parameter-heavy Transformer based LMs (Devlin et al., 2019), maintaining multiple domain-specific LMs is practically infeasible. While on the other hand, using a generic LM for all domains falls short in performance when compared to multiple domain-specific LMs. Therefore, a need for a middle ground between performance and costs is evident. To overcome this problem, we bring forward a methodology based on recently proposed Prompt Tuning. Lester et al. (2021) introduced this idea of learning the token embeddings of the prompt used to prime a LM to a particular task. Prompts are special tokens describing a task which when appended to the input data sample, helps the model understand and use this problem description to better solve the task. For example, to solve the machine translation task, instead of fine-tuning the Transformer model with corresponding dataset, one can achieve comparable performance by just showing text describing machine translation to the powerful Transformer-based LM. In prompt tuning, instead of providing this prompt manually to the model, one learn it from the labelled examples from the task. To the best of our knowledge, we are the first to apply prompt-tuning for domain-adaptation of ASR systems.