Raw Waveform-Based End-to-End Deep Convolutional Network for Spatial Localization of Multiple Acoustic Sources
2020
In this paper, we present an end-to-end deep convolutional neural network operating on multi-channel raw audio data to localize multiple simultaneously active acoustic sources in space. Previously reported deep-learning-based approaches work well in localizing a single source directly from multi-channel raw audio but are not easily extendable to localize multiple sources due to the well-known permutation problem. We propose a novel encoding scheme to represent the spatial coordinates of multiple sources, which facilitates 2D localization of multiple sources in an end-to-end fashion, avoiding the permutation problem and achieving arbitrary spatial resolution. Experiments on a simulated data set and real recordings from the AV16.3 Corpus demonstrate that the proposed method generalizes well to unseen test conditions and outperforms a recent time difference of arrival (TDOA) based multiple-source localization approach reported in the literature.
Research areas