Conversational AI

Building software and systems that help people communicate with computers naturally, as if communicating with family and friends.

ConvRNN-T: Convolutional augmented recurrent neural network transducers for streaming speech recognition

Martin Radfar, Rohit Barnwal, Rupak Vignesh Swaminathan, Feng-Ju (Claire) Chang, Grant Strimel, Nathan Susanj, Thanasis Mouchtaris

Interspeech 2022

2022

The recurrent neural network transducer (RNN-T) is a prominent streaming end-to-end (E2E) ASR technology. In RNN-T, the acoustic encoder commonly consists of stacks of LSTMs. Very recently, as an alternative to LSTM layers, the Conformer architecture was introduced where the encoder of RNN-T is replaced with a modified Transformer encoder composed of convolutional layers at the frontend and between attention

Related: Alexa speech science developments at Interspeech 2022

Conversational AI
Incremental learning for RNN-Transducer based speech recognition models

Deepak Baby, Pasquale D'Alterio, Valentin Mendelev

Interspeech 2022

2022

This paper investigates an incremental learning framework for a real-world voice assistant employing RNN-Transducer based automatic speech recognition (ASR) model. Such a model needs to be regularly updated to keep up with changing distribution of customer requests. We demonstrate that a simple fine-tuning approach with a combination of old and new training data can be used to incrementally update the model

Conversational AI
Squashed weight distribution for low bit quantization of deep models

Nikko Ström, Haidar Khan, Wael Hamza

Interspeech 2022

2022

Inference with large deep learning models in resource-constrained settings is increasingly a bottleneck in real-world applications of state-of-the-art AI. Here we address this by low-precision weight quantization. We achieve very low accuracy degradation by reparameterizing the weights in a way that leaves the weight distribution approximately uniform. We show lower bit-width quantization and less accuracy

Related: Alexa’s spoken-language-understanding research at Interspeech 2022

Conversational AI
Creating new voices using normalizing flows

Piotr Bilinski, Tom Merritt, Abdelhamid Ezzerg, Kamil Pokora, Sebastian Cygert, Kayoko Yanagisawa, Roberto Barra-Chicote, Daniel Korzekwa

Interspeech 2022

2022

Creating realistic and natural-sounding synthetic speech remains a big challenge for voice identities unseen during training. As there is growing interest in synthesizing voices of new speakers, here we investigate the ability of normalizing flows in text-to-speech (TTS) and voice conversion (VC) modes to extrapolate from speakers observed during training to create unseen speaker identities. Firstly, we

Related: Alexa’s text-to-speech research at Interspeech 2022

Conversational AI
Computer-assisted pronunciation training — speech synthesis is almost all you need

Daniel Korzekwa, Jaime Lorenzo Trueba, Thomas Drugman, Bozena Kostek

Computer Assisted Language Learning Journal

2022

The research community has long studied computer-assisted pronunciation training (CAPT) methods in non-native speech. Researchers focused on studying various model architectures, such as Bayesian networks and deep learning methods, as well as on the analysis of different representations of the speech signal. Despite significant progress in recent years, existing CAPT methods are not able to detect pronunciation

Conversational AI

Cross-lingual transfer learning for multilingual voice agents

Olga Golovneva

January 13, 2021

In experiments, multilingual models outperform monolingual models.

Conversational AI
Credit: Glynis Condon

Amazon wins best-paper award at computational-linguistics conference

Tobias Falke

December 18, 2020

Researchers propose a method to automatically generate training data for Alexa by identifying cases in which customers rephrase unsuccessful requests.

Conversational AI
Credit: Shirin Saleem

How Alexa's new Live Translation for conversations works

Shirin Saleem, Roland Maas

December 14, 2020

Parallel speech recognizers, language ID, and translation models geared to conversational speech are among the modifications that make Live Translation possible.

Conversational AI
Amazon Alexa scientists Yang Liu and Ruhi Sarikaya named IEEE Fellows

Staff writer

December 3, 2020

Scientists are recognized for their contributions to conversational understanding systems.

Conversational AI
Credit: Glynis Condon

A version of the BERT language model that’s 20 times as fast

Adrian de Wynter

December 3, 2020

Determining the optimal architectural parameters reduces network size by 84% while improving performance on natural-language-understanding tasks.

Machine learning
Credit: Glynis Condon

Mitigating social bias in knowledge graph embeddings

Joseph Fisher

November 25, 2020

Method significantly reduces bias while maintaining comparable performance on machine learning tasks.

Information and knowledge management

Conversational AI

Publications

Related content

Work with us