CAMP: A two-stage approach to modelling prosody in context

Zack Hodari; Alexis Moinet; Sri Karlapati; Jaime Lorenzo Trueba; Tom Merritt; Arnaud Joly; Ammar Abbas; Penny Karanasou; Thomas Drugman

Publication

CAMP: A two-stage approach to modelling prosody in context

By Zack Hodari, Alexis Moinet, Sri Karlapati, Jaime Lorenzo Trueba, Tom Merritt, Arnaud Joly, Ammar Abbas, Penny Karanasou, Thomas Drugman

2021

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Prosody is an integral part of communication, but remains an open problem in state-of-the-art speech synthesis. There are two major issues faced when modelling prosody: (1) prosody varies at a slower rate compared with other content in the acoustic signal (e.g. segmental information and background noise); (2) determining appropriate prosody without sufficient context is an ill-posed problem. In this paper, we propose solutions to both these issues. To mitigate the challenge of modelling a slow-varying signal, we learn to disentangle prosodic information using a word level representation. To alleviate the ill-posed nature of prosody modelling, we use syntactic and semantic information derived from text to learn a contextdependent prior over our prosodic space. Our context-aware model of prosody (CAMP) outperforms the state-of-the-art technique, closing the gap with natural speech by 26%. We also find that replacing attention with a jointly trained duration model improves prosody significantly.

CAMP: A two-stage approach to modelling prosody in context

Latest news

Work with us