In Other News: A Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data

Nishant Prateek; Mateusz Lajszczak; Tom Merritt; Srikanth Ronanki; Jaime Lorenzo Trueba; Trevor Wood; Roberto Barra-Chicote; Thomas Drugman

Publication

In Other News: A Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data

By Nishant Prateek, Mateusz Lajszczak, Tom Merritt, Srikanth Ronanki, Jaime Lorenzo Trueba, Trevor Wood, Roberto Barra-Chicote, Thomas Drugman

2019

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Neural text-to-speech synthesis (NTTS) models have shown significant progress in generating high-quality speech, however they require a large quantity of training data. This makes creating models for multiple styles expensive and time-consuming. In this paper different styles of speech are analysed based on prosodic variations, from this a model is proposed to synthesise speech in the style of a newscaster, with just a few hours of supplementary data. We pose the problem of synthesising in a target style using limited data as that of creating a bi-style model that can synthesise both neutral-style and newscasterstyle speech via a one-hot vector which factorises the two styles. We also propose conditioning the model on contextual word embeddings, and extensively evaluate it against neutral NTTS, and neutral concatenativebased synthesis. This model closes the gap in perceived style-appropriateness between natural recordings for newscaster-style of speech, and neutral speech synthesis by approximately two-thirds.

In Other News: A Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data

Latest news

Work with us