Conversational AI

Building software and systems that help people communicate with computers naturally, as if communicating with family and friends.

Context-aware dynamic pruning for speech foundation models

Masao Someki, Yifan Peng, Siddhant Arora, Shinji Watanabe, Markus Müller, Thanasis Mouchtaris, Grant Strimel, Jing Liu

ICLR 2025

2025

Foundation models, such as large language models, have achieved remarkable success in natural language processing and are evolving into models capable of handling multiple modalities. Listening ability, in particular, is crucial for many applications, leading to research on building speech foundation models. However, the high computational cost of these large models presents a significant challenge for

Related: Pruning network nodes on the fly to improve LLM efficiency

Conversational AI
Amazon Nova Sonic: Technical report and model card

Amazon Artificial General Intelligence

Amazon Technical Reports

2025

We present Amazon Nova Sonic, a new multimodal foundation model that unifies speech and text processing in a single architecture, delivering frontier voice intelligence and industry-leading price performance. Amazon Nova Sonic ("Nova Sonic") builds on the advances in large pre-trained text and speech models, while fusing the two modalities in a unified architecture to power downstream tasks requiring both

Conversational AI
QID: Efficient query-informed ViTs in data-scarce regimes for OCR-free visual document understanding

Binh Le, Shaoyuan Xu, Jinmiao Fu, Zhishen (Leo) Huang, Moyan Li, Yanhui Guo, Hongdong Li, Sameera Ramasinghe, Bryan Wang

CVPR 2025

2025

In Visual Document Understanding (VDU) tasks, fine-tuning a pre-trained Vision-Language Model (VLM) with new datasets often falls short in optimizing the vision en-coder to identify query-specific regions in text-rich document images. Existing methods that directly inject queries into model layers by modifying the network architecture often struggle to adapt to new datasets with limited annotations. To

Conversational AI
UTFix: Change aware unit test repairing using LLM

Shanto Rahman, Sachit Kuhar, Berk Cirisci, Pranav Garg, Shiqi Wang, Xiaofei Ma, Anoop Deoras, Baishakhi Ray

OOPSLA 2025

2025

Software updates, including bug repair and feature additions, are frequent in modern applications but they often leave test suites outdated, resulting in undetected bugs and increased chances of system failures. A recent study by Meta revealed that 14%-22% of software failures stem from outdated tests that fail to reflect changes in the codebase. This highlights the need to keep tests in sync with code

Conversational AI
CoLLM: A large language model for composed image retrieval

Chuong Huynh, Jinyu Yang, Ashish Tawari, Mubarak Shah, Son Tran, Raffay Hamid, Trishul Chilimbi, Abhinav Shrivastava

CVPR 2025

2025

Composed Image Retrieval (CIR) is a complex task that aims to retrieve images based on a multimodal query. Typical training data consists of triplets containing a reference image, a textual description of desired modifications, and the target image, which are expensive and time-consuming to acquire. The scarcity of CIR datasets has led to zero-shot approaches utilizing synthetic triplets or leveraging vision-language

Computer vision

Amazon at ACL: How to teach machines to reason

Larry Hardesty

July 29, 2021

Amazon’s Dan Roth on a hot new research topic — that he’s been studying for more than 25 years.

Conversational AI
Credit: Glynis Condon

Alexa Prize Socialbot Grand Challenge 4 finalists announced

Alexa Prize team

July 19, 2021

Five teams to compete for $500,000 first prize; winners will be announced in August 2021.

Conversational AI
Alexa & Friends features Amazon Scholar Julia Hirschberg

Staff writer

July 15, 2021

Hirschberg explains why mastering empathetic speech is critical for successful dialogue systems.

Conversational AI
Amazon paper exposes biases in unreliable-news datasets

Heba Elfardy, Christos Christodoulopoulos, Thomas Butler

July 15, 2021

The paper, which received honorable mention at EACL, presents guidelines for better analysis and construction of datasets.

Conversational AI
Searching video using natural-language descriptions

Mohsen Malmir

July 12, 2021

New method uses cross-attention and multitask training to improve the accuracy and training efficiency of video moment retrieval.

Search and information retrieval
Third Conference on Truth and Trust Online open for submissions

Christos Christodoulopoulos

July 9, 2021

The conference’s mission is to bring together stakeholders working toward improving the truthfulness and trustworthiness of online communications.

Conversational AI

Conversational AI

Publications

Related content

Work with us