Listen, know and spell: Knowledge-infused subword modeling for improving ASR performance of out-of-vocabulary (OOV) named entities
2022
Automatic speech recognition (ASR) is increasingly being used in specialized domains such as medical ASR and news transcription. Owing to the lack of high quality annotated speech data in such domains, off-the-shelf models are commonly employed by fine-tuning on domain-specific data. This poses a significant challenge in transcribing long-tail expressions and out-of-vocabulary (OOV) named entities. On the other hand, readily available knowledge graphs (KGs) provide semantically structured knowledge for such domain-specific named entities. In this work, we propose the Knowledge Infused Subword Model (KISM), a novel technique for incorporating semantic context from KGs into the ASR pipeline for improving the performance of OOV named entities. Our experiments show that KISM improves OOV recall of an ASR model by 4.58% (absolute) for named entities that were not seen during training.
Research areas