Conversational AI

Building software and systems that help people communicate with computers naturally, as if communicating with family and friends.

Beyond speaker identity: Text guided target speech extraction

Mingyue Huo, Abhinav Jain, Cong Phuoc Huynh, Fanjie Kong, Pichao Wang, Zhu Liu, Vimal Bhat

ICASSP 2025

2025

Target Speech Extraction (TSE) traditionally relies on explicit clues about the speaker’s identity like enrollment audio, face images, or videos, which may not always be available. In this paper, we propose a text-guided TSE model StyleTSE that uses natural language descriptions of speaking style in addition to the audio clue to extract the desired speech from a given mixture. Our model integrates a speech

Conversational AI
Using instruction-tuned LMs for scalable use case-based shopping - Where customers meet their needs

Rajdeep Mukherjee, Sonali Singh, Sachin Farfade

KDD 2025

2025

Products on e-commerce platforms are usually organized based on seller-provided product attributes. Customers looking for a product typically have certain needs or use cases in mind, such as headphones for gym classes, or a printer for school projects. However, they often struggle to map these use cases to product attributes, thereby failing to find the product they need. To help customers shop online confidently

Conversational AI
Improving lip-synchrony in direct audio-visual speech-to-speech translation

Lucas Goncalves, Prashant Mathur, Xing Niu, Brady Houston, Chandrashekhar Lavania, Srikanth Vishnubhotla, Lijia Sun, Anthony Ferritto

ICASSP 2025

2025

Audio-Visual Speech-to-Speech Translation (AVS2S) typically prioritizes improving translation quality and naturalness. However, an equally critical aspect in audio-visual content is lip-synchrony—ensuring that the movements of the lips match the spoken content—essential for maintaining realism in dubbed videos. Despite its importance, the inclusion of lip-synchrony constraints in AVS2S models has been largely

Conversational AI
Lightweight neural front-ends for low-resource on-device text-to-speech

Giulia Comini, Heereen Shim, Sam Ribeiro

ICASSP 2025

2025

We propose a lightweight neural front-end framework for on-device speech generation and highlight its benefits towards low-resource language scaling. While data-driven models have shown potential in front-end literature, especially since they can enable fast language expansion, they are often extremely large and of high latency. There is limited work focusing on their usability in real-time settings, and

Conversational AI
SEAL: Speaker error correction using acoustic-conditioned large language models

Anurag Kumar, Rohit Paturi, Amber Afshan, Sundararajan Srinivasan

ICASSP 2025

2025

Speaker Diarization (SD) is a crucial component of modern end-to-end ASR pipelines. Traditional SD systems, which are typically audio-based and operate independently of ASR, often introduce speaker errors, particularly during speaker transitions and overlapping speech. Recently, language models including fine-tuned large language models (LLMs) have shown to be effective as a second-pass speaker error corrector

Conversational AI

20B-parameter Alexa model sets new marks in few-shot learning

Saleh Soltan

August 2, 2022

With an encoder-decoder architecture — rather than decoder only — the Alexa Teacher Model excels other large language models on few-shot tasks such as summarization and machine translation.

Conversational AI
Columbia University

Amazon Scholar Kathleen McKeown receives dual honors

Staff writer

August 1, 2022

McKeown awarded IEEE Innovation in Societal Infrastructure Award and named a member of the American Philosophical Society.

Conversational AI
“I didn’t imagine I could grow and learn so much”

Ayeshah Émon

July 28, 2022

Donato Crisostomi talks about how his mother helped spark a love of knowledge that led him to two science internships at Amazon.

Conversational AI
Massively Multilingual NLU 2022: Call for papers and shared-task entries

Jack G. M. FitzGerald, Kay Rottmann

July 22, 2022

New EMNLP workshop will feature talks, papers, posters, and a competition built around the 50-plus-language, million-utterance MASSIVE dataset.

Conversational AI
Filtering out "forbidden" documents during information retrieval

David Carmel

July 15, 2022

New method optimizes the twin demands of retrieving relevant content and filtering out bad content.

Search and information retrieval
Why ambient computing needs self-learning

Ruhi Sarikaya

July 14, 2022

To become the interface for the Internet of things, conversational agents will need to learn on their own. Alexa has already started down that path.

Conversational AI

Conversational AI

Publications

Related content

Work with us