Conversational AI

Building software and systems that help people communicate with computers naturally, as if communicating with family and friends.

A functionality-grounded benchmark for evaluating web agents in e-commerce domains

Xianren Zhang, Shreyas Prasad, Di Wang, Qiuhai Zeng, Suhang Wang, Wenbo Yan, Mat Hans

ACL 2026

2026

Web agents have shown great promise in performing many tasks on e-commerce websites. To assess their capabilities, several benchmarks have been introduced. However, current benchmarks in the e-commerce domain face two major problems. First, they primarily focus on product search tasks (e.g., 'Find an Apple Watch'), failing to capture the broader range of functionalities offered by real-world e-commerce

Conversational AI
Encoding domain expertise in agents: Lessons from NFL Fantasy AI

Michael Butler, Henry Wang, Jake Lee, Kenton Blacut, Dan Volk, Mike Band, Diego Socolinsky

ISACE 2026

2026

Agentic AI systems can access vast data but struggle to apply domain expertise, namely the contextual understanding of how to use specialized information. This paper presents a practical framework for encoding such expertise, demonstrated with the National Football League (NFL) through NFL Fantasy AI, a production system delivering analyst-grade fantasy football advice, as assessed by NFL Pro analysts.

Conversational AI
A modular LLM framework for explainable price outlier detection

Shadi Sartipi, John Wu, Sina Ghotbi, Nikhita Vedula, Shervin Malmasi

ICLR 2026 Workshop on Advances in Financial AI

2026

Detecting product price outliers is important for retail and e-commerce stores as erroneous or unexpectedly high prices adversely affect competitiveness, revenue, and consumer trust. Classical techniques offer simple thresholds while ignoring the rich semantic relationships among product attributes. We propose an agentic Large Language Model (LLM) framework that treats outlier price flagging as a reasoning

Conversational AI
Hindsight-anchored policy optimization: Turning failure into feedback in sparse reward settings

Yuning Wu, Ke Wang, Devin Chen, Kai Wei

ICLR 2026 Workshop on Catch, Adapt, and Operate: Monitoring ML Models Under Drift

2026

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising paradigm for post-training reasoning models. However, group-based methods such as Group Relative Policy Optimization (GRPO) face a critical dilemma in sparse-reward settings: pure Reinforcement Learning (RL) suffers from advantage collapse and high-variance gradient estimation, while mixed-policy optimization introduces persistent

Conversational AI
Investigating equation-only reasoning in large language models

Jonathan Chung, Ramya Toshniwal

ICLR 2026

2026

While Large Language Models excel at mathematical reasoning with Chain-of-Thought prompting, their ability to perform systematic arithmetic reasoning without natural language scaffolding remains poorly understood. We investigate equation-only supervision, where LLMs map natural language problems directly to symbolic equation sequences without intermediate explanations. This approach separates reasoning

Conversational AI

3 questions about Interspeech 2018 with Björn Hoffmeister

Larry Hardesty

August 24, 2018

This year’s Interspeech — the largest conference in speech technology — will take place in Hyderabad, India, the first week of September. More than 40 Amazon researchers will be attending, including Björn Hoffmeister, the senior manager for machine learning in the Alexa Automatic Speech Recognition group. He took a few minutes to answer three questions about this year’s conference.

Conversational AI
Alexa, do I need to use your wake word? How about now?

Sri Harish Mallidi

August 23, 2018

Here’s a fairly common interaction with Alexa: “Alexa, set volume to five”; “Alexa, play music”. Even though the queries come in quick succession, the customer needs to repeat the wake word “Alexa”. To allow for more natural interactions, the device could immediately re-enter its listening state after the first query, without wake-word repetition; but that would require it to detect whether a follow-up speech input is indeed a query intended for the device (“device-directed”) or just background speech (“non-device-directed”).

Conversational AI
Public release of fact-checking dataset quickly begins to pay dividends

Larry Hardesty

August 19, 2018

At the annual meeting of the North American chapter of the Association for Computational Linguistics in June, researchers at Amazon and the University of Sheffield released a new dataset that can be used to train machine-learning systems to determine the veracity of factual assertions online. The dataset is called FEVER, for fact extraction and verification.

Search and information retrieval
Shrinking machine learning models for offline use

Grant Strimel

August 18, 2018

"Perfect hashing" is among the techniques that reduce the memory footprints of machine learning models by 94%.

Conversational AI
Automatic transliteration can help Alexa find data across language barriers

Yuval Merhav, Steve Ash

August 8, 2018

New machine-learned multilingual named-entity transliteration system.

Conversational AI
Contextual Clues Can Help Improve Alexa’s Speech Recognizers

Anirudh Raju

July 23, 2018

Automatic speech recognition systems, which convert spoken words into text, are an important component of conversational agents such as Alexa. These systems generally comprise an acoustic model, a pronunciation model, and a statistical language model. The role of the statistical language model is to assign a probability to the next word in a sentence, given the previous ones. For instance, the phrases “Pulitzer Prize” and “pullet surprise” may have very similar acoustic profiles, but statistically, one is far more likely to conclude a question that begins “Alexa, what playwright just won a … ?”

Conversational AI

Conversational AI

Publications

Related content

Work with us