A multimodal strategy for singing language identification

Wo Jae Lee; Emanuele Coviello

Publication

A multimodal strategy for singing language identification

By Wo Jae Lee, Emanuele Coviello

2022

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Identification of the language of performance of songs is important for applications such as personalized recommendations, discovery, and search. In this paper, we present an automated multimodal approach to identify the singing language of songs that scales to millions of songs. The proposed model uses a variety of song-level features, including a consumption embedding derived from sessions listening data from a music streaming service, segment-level vocals embedding computed from the vocal track of a song, and generic timbral features. Our experimental results show that our approach outperforms benchmark models in the signing-language identification task, and demonstrates the benefit of the multimodal approach through an ablation study. In addition, we present a data augmentation technique to increase the robustness of the model to missing data modalities.

A multimodal strategy for singing language identification

Latest news

Work with us