On the robustness of deep learning-based speech enhancement
2022
In this paper, we present the design of a robust deep neural network based speech enhancement (DNNSE) solution for joint noise reduction and dereverberation under real-world acoustic conditions. This makes our proposed solution suitable for smart-speaker products that encounter a wide variety of acoustic challenges during their real-world deployment. We provide a systematic introduction to the acoustic challenges involved in real-world products and perform a detailed analysis to compare DNNSE models using metrics such as short-time objective intelligibility (STOI), scale-invariant signal-to-distortion-ratio (SI-SDR), and generalizability to unseen acoustic conditions. We then develop a robust DNNSE solution along with a robust training procedure that are well suited for the acoustic challenges specified in the paper. Through detailed analysis, we demonstrate that our DNNSE solution performs and generalizes better than a baseline solution that is 5x-6x times larger.
Research areas