Semi-supervised singing voice separation with noise self-training

Zhepei Wang; Ritwik Giri; Umut Isik; Jean-Marc Valin; Arvindh Krishnaswamy

Publication

Semi-supervised singing voice separation with noise self-training

By Zhepei Wang, Ritwik Giri, Umut Isik, Jean-Marc Valin, Arvindh Krishnaswamy

2021

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Recent progress in singing voice separation has primarily focused on supervised deep learning methods. However, the scarcity of groundtruth data with clean musical sources has been a problem for long. Given a limited set of labeled data, we present a method to leverage a large volume of unlabeled data to improve the model’s performance. Following the noisy self-training framework, we first train a teacher network on the small labeled dataset and infer pseudo-labels from the large corpus of unlabeled mixtures. Then, a larger student network is trained on combined ground-truth and self-labeled datasets. Empirical results show that the proposed self-training scheme, along with data augmentation methods, effectively leverage the large unlabeled corpus and obtain superior performance compared to supervised methods.

Semi-supervised singing voice separation with noise self-training

Latest news

Work with us