Conversational AI

Building software and systems that help people communicate with computers naturally, as if communicating with family and friends.

ConvRNN-T: Convolutional augmented recurrent neural network transducers for streaming speech recognition

Martin Radfar, Rohit Barnwal, Rupak Vignesh Swaminathan, Feng-Ju (Claire) Chang, Grant Strimel, Nathan Susanj, Thanasis Mouchtaris

Interspeech 2022

2022

The recurrent neural network transducer (RNN-T) is a prominent streaming end-to-end (E2E) ASR technology. In RNN-T, the acoustic encoder commonly consists of stacks of LSTMs. Very recently, as an alternative to LSTM layers, the Conformer architecture was introduced where the encoder of RNN-T is replaced with a modified Transformer encoder composed of convolutional layers at the frontend and between attention

Related: Alexa speech science developments at Interspeech 2022

Conversational AI
Incremental learning for RNN-Transducer based speech recognition models

Deepak Baby, Pasquale D'Alterio, Valentin Mendelev

Interspeech 2022

2022

This paper investigates an incremental learning framework for a real-world voice assistant employing RNN-Transducer based automatic speech recognition (ASR) model. Such a model needs to be regularly updated to keep up with changing distribution of customer requests. We demonstrate that a simple fine-tuning approach with a combination of old and new training data can be used to incrementally update the model

Conversational AI
Squashed weight distribution for low bit quantization of deep models

Nikko Ström, Haidar Khan, Wael Hamza

Interspeech 2022

2022

Inference with large deep learning models in resource-constrained settings is increasingly a bottleneck in real-world applications of state-of-the-art AI. Here we address this by low-precision weight quantization. We achieve very low accuracy degradation by reparameterizing the weights in a way that leaves the weight distribution approximately uniform. We show lower bit-width quantization and less accuracy

Related: Alexa’s spoken-language-understanding research at Interspeech 2022

Conversational AI
Creating new voices using normalizing flows

Piotr Bilinski, Tom Merritt, Abdelhamid Ezzerg, Kamil Pokora, Sebastian Cygert, Kayoko Yanagisawa, Roberto Barra-Chicote, Daniel Korzekwa

Interspeech 2022

2022

Creating realistic and natural-sounding synthetic speech remains a big challenge for voice identities unseen during training. As there is growing interest in synthesizing voices of new speakers, here we investigate the ability of normalizing flows in text-to-speech (TTS) and voice conversion (VC) modes to extrapolate from speakers observed during training to create unseen speaker identities. Firstly, we

Related: Alexa’s text-to-speech research at Interspeech 2022

Conversational AI
Computer-assisted pronunciation training — speech synthesis is almost all you need

Daniel Korzekwa, Jaime Lorenzo Trueba, Thomas Drugman, Bozena Kostek

Computer Assisted Language Learning Journal

2022

The research community has long studied computer-assisted pronunciation training (CAPT) methods in non-native speech. Researchers focused on studying various model architectures, such as Bayesian networks and deep learning methods, as well as on the analysis of different representations of the speech signal. Despite significant progress in recent years, existing CAPT methods are not able to detect pronunciation

Conversational AI

Amazon Nova AI Challenge accelerating the field of generative AI

Staff writer

March 10, 2025

Inaugural global university competition focused on advancing secure, trusted AI-assisted software development.

Conversational AI
Training code generation models to debug their own outputs

Varun Kumar

February 20, 2025

Using large language models to generate training data and updating models through both fine tuning and reinforcement learning improves the success rate of code generation by 39%.

Conversational AI
Lightweight LLM for converting text to structured data

Karim Bouyarmane

February 6, 2025

Novel training procedure and decoding mechanism enable model to outperform much larger foundation model prompted to perform the same task.

Conversational AI
Unlocking insights from qualitative text with LLM-enhanced topic modeling

Sreyoshi Bhaduri, Satya Kapoor

December 11, 2024

LLM-augmented clustering enables QualIT to outperform other topic-modeling methods in both topic coherence and topic diversity.

Conversational AI
Amazon opens new AI lab in San Francisco focused on long-term research bets

David Luan, Pieter Abbeel

December 9, 2024

The Amazon AGI SF Lab will focus on developing new foundational capabilities for enabling useful AI agents.

Conversational AI
New Amazon Nova image- and video-generating models

Ying Wang, Xiaohan Fei

December 4, 2024

Amazon Nova Canvas and Amazon Nova Reel use diffusion transformers to deliver studio-quality visual content.

Computer vision

Conversational AI

Publications

Related content

Work with us