Conversational AI

Building software and systems that help people communicate with computers naturally, as if communicating with family and friends.

Graph-based multi-view fusion and local adaptation: Mitigating within household confusability for speaker identification

Long Chen, Yixiong Meng, Venkatesh Ravichandran, Andreas Stolcke

Interspeech 2022

2022

Speaker identification (SID) in the household scenario (e.g., for smart speakers) is an important but challenging problem due to limited number of labeled (enrollment) utterances, confusable voices, and demographic imbalances. Conventional speaker recognition systems generalize from a large random sample of speakers, causing the recognition to underperform for households drawn from specific cohorts or otherwise

Conversational AI
Unify and conquer: How phonetic feature representation affects polyglot text-to-speech (TTS)

Ariadna Sanchez, Alessio Falai, Ziyao Zhang, Orazio Angelini, Kayoko Yanagisawa

Interspeech 2022

2022

An essential design decision for multilingual Neural Text-ToSpeech (NTTS) systems is how to represent input linguistic features within the model. Looking at the wide variety of approaches in the literature, two main paradigms emerge, unified and separate representations. The former uses a shared set of phonetic tokens across languages, whereas the latter uses unique phonetic tokens for each language. In

Related: How a lifelong music student uses melody and lyrics in TTS research

Conversational AI
Simple and effective multi-sentence TTS with expressive and coherent prosody

Peter Makarov, Ammar Abbas, Mateusz Lajszczak, Arnaud Joly, Sri Karlapati, Alexis Moinet, Thomas Drugman, Penny Karanasou

Interspeech 2022

2022

Generating expressive and contextually appropriate prosody remains a challenge for modern text-to-speech (TTS) systems. This is particularly evident for long, multi-sentence inputs. In this paper, we examine simple extensions to a Transformer-based FastSpeech-like system, with the goal of improving prosody for multi-sentence TTS. We find that long context, powerful text features, and training on multi-speaker

Conversational AI
Expressive, variable, and controllable duration modelling in TTS

Ammar Abbas, Tom Merritt, Alexis Moinet, Sri Karlapati, Ewa Muszynska, Simon Slangen, Elia Gatti, Thomas Drugman

Interspeech 2022

2022

Duration modelling has become an important research problem once more with the rise of non-attention neural text-to-speech systems. The current approaches largely fall back to relying on previous statistical parametric speech synthesis technology for duration prediction, which poorly models the expressiveness and variability in speech. In this paper, we propose two alternate approaches to improve duration

Conversational AI
CopyCat2: A single model for multi-speaker TTS and many-to-many fine-grained prosody transfer

Sri Karlapati, Penny Karanasou, Mateusz Lajszczak, Ammar Abbas, Alexis Moinet, Peter Makarov, Ray Li, Arent van Korlaar, Simon Slangen, Thomas Drugman

Interspeech 2022

2022

In this paper, we present CopyCat2 (CC2), a novel model capable of: a) synthesizing speech with different speaker identities, b) generating speech with expressive and contextually appropriate prosody, and c) transferring prosody at fine-grained level between any pair of seen speakers. We do this by activating distinct parts of the network for different tasks. We train our model using a novel approach to

Related: Alexa’s text-to-speech research at Interspeech 2022

Conversational AI

ASRU: Integrating speech recognition and language understanding

Larry Hardesty

December 17, 2021

Amazon’s Jimmy Kunzmann on how “signal-to-interpretation” models improve availability, performance.

Conversational AI
Jodi Jacobson/iStock

"Alexa, let's work together” is now live!

Yoelle Maarek

December 3, 2021

Learn how you can help the university teams competing to develop agents that will assist customers with completing tasks requiring multiple steps.

Conversational AI
Amazon Research Awards issues winter 2022 call for proposals

Amazon Research Awards team

November 30, 2021

Submission period extends from December 6, 2021 to January 21, 2022.

Conversational AI
Amazon releases dataset to help detect counterfactual phrases

Danushka Bollegala

November 19, 2021

Identifying descriptions of events that did not take place in product reviews improves product retrieval results.

Conversational AI
New Alexa feature enables natural, multiparty interactions

the Alexa AI team

November 18, 2021

In Conversation Mode, Alexa detects device-directed speech without the need for the wake word.

Conversational AI
Adapting machine translation models to new genres

Eva Hasler

November 8, 2021

Combining elastic weight consolidation and data mixing yields better trade-offs between performance on old and new tasks.

Conversational AI

Conversational AI

Publications

Related content

Work with us