Robust Multi-Channel Speech Recognition using Frequency Aligned Network

Taejin Park; Kenichi Kumatani; Minhua Wu; Shiva Sundaram

Publication

Robust Multi-Channel Speech Recognition using Frequency Aligned Network

By Taejin Park, Kenichi Kumatani, Minhua Wu, Shiva Sundaram

2020

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Conventional speech enhancement techniques such as beamforming have known benefits for far-field speech recognition. Our own work in frequency-domain multi-channel acoustic modeling has shown additional improvements by training a spatial filtering layer jointly within an acoustic model. In this paper, we further develop this idea and use a frequency-aligned network for robust multi-channel automatic speech recognition (ASR). Unlike an affine layer in the frequency domain, the proposed frequency-aligned component prevents one frequency bin influencing other frequency bins. We show that this modification not only reduces the number of parameters in the model but also significantly improves the ASR performance. We investigate effects of frequency-aligned networks through ASR experiments on the real-world far-field data where users are interacting with an ASR system in uncontrolled acoustic environments. We show that our multi-channel acoustic model with a frequency-aligned network shows up to 18% relative reduction in word error rate.

Robust Multi-Channel Speech Recognition using Frequency Aligned Network

Latest news

Work with us