-
Interspeech 20222022Training multilingual Neural Text-To-Speech (NTTS) models using only monolingual corpora has emerged as a popular way for building voice cloning based Polyglot NTTS systems. In order to train these models, it is essential to understand how the composition of the training corpora affects the quality of multilingual speech synthesis. In this context, it is common to hear questions such as “Would including
-
Interspeech 20222022We introduce a new automatic evaluation method for speaker similarity assessment, that is consistent with human perceptual scores. Modern neural text-to-speech models require a vast amount of clean training data, which is why many solutions switch from single speaker models to solutions trained on examples from many different speakers. Multi-speaker models bring new possibilities, such as a faster creation
-
Interspeech 20222022We present a streaming, Transformer-based end-to-end automatic speech recognition (ASR) architecture which achieves efficient neural inference through compute cost amortization. Our architecture creates sparse computation pathways dynamically at inference time, resulting in selective use of compute resources throughout decoding, enabling significant reductions in compute with minimal impact on accuracy.
-
Interspeech 20222022In this work we improve the style representation for cross-lingual style transfer. Specifically, we improve the Spanish representation across four styles, Newscaster, DJ, Excited, and Disappointed, whilst maintaining a single speaker identity for which we only have English samples. This is achieved using Learned Conditional Prior VAE (LCPVAE), a hierarchical Variational Auto Encoder (VAE) approach. A secondary
-
KDD 20222022Multi-Task Learning (MTL) is widely-accepted in Natural Language Processing as a standard technique for learning multiple related tasks in one model. Training an MTL model requires having the training data for all tasks available at the same time. As systems usually evolve over time, (e.g., to support new functionalities), adding a new task to an existing MTL model usually requires retraining the model
Related content
-
June 15, 2021Alexa Fund company unlocks voice-based computing for people who have trouble using their voices.
-
June 11, 2021Proteno model dramatically increases the efficiency of the first step in text-to-speech conversion.
-
June 10, 2021Recasting different natural-language tasks in the same form dramatically improves few-shot multitask learning.
-
June 04, 2021Topics range from the predictable, such as speech recognition and noise cancellation, to singing separation and automatic video dubbing.
-
June 03, 2021Rastrow discussed the continued challenges and expanded role of speech recognition, and some of the interesting research and themes that emerged from ICASSP 2021.
-
June 03, 2021Amazon Scholar Heng Ji says that deep learning could benefit from the addition of a little linguistic intuition.