Conversational AI

Building software and systems that help people communicate with computers naturally, as if communicating with family and friends.

OrchDAG: Complex tool orchestration in multi-turn interactions with plan DAGs

Yifu Lu, Shengjie Liu, Li Dong

NeurIPS 2025 Workshop on Multi-Turn Interactions in Large Language Models

2025

Agentic tool use has gained traction with the rise of agentic tool calling, yet most existing work overlooks the complexity of multi-turn tool interactions. We introduce OrchDAG, a synthetic data generation pipeline that models tool execution as directed acyclic graphs (DAGs) with controllable complexity. Using this dataset, we benchmark model performance and propose a graph-based reward to enhance RLVR

Conversational AI
SQLENS: An end-to-end framework for error detection and correction in text-to-SQL

Yue Gong, Chuan Lei, Xiao Qin, Kapil Eknath Vaidya, Balakrishnan (Murali) Narayanaswamy, Tim Kraska

NeurIPS 2025

2025

Text-to-SQL systems translate natural language (NL) questions into SQL queries, enabling non-technical users to interact with structured data. While large language models (LLMs) have shown promising results on the text-to-SQL task, they often produce semantically incorrect yet syntactically valid queries, with limited insight into their reliability. We propose SQLENS, an end-to-end framework for fine-grained

Conversational AI
SABER: Small actions, big errors — Safe-guarding mutating steps in LLM agents

Alex Cuadron Lafuente, Pengfei Yu, Yang Liu, Arpit Gupta

arXiv

2025

Despite rapid progress in LLM agents, performance on long-horizon, tool-using tasks remains fragile. To better understand this fragility, we ask a simple question: do all actions contribute equally to failure? Analyzing execution traces on τ-Bench (Airline/Retail) and SWE-Bench Verified, we decompose trajectories into mutating (environment-changing) vs. non-mutating steps and formalize de-cisive deviations—earliest

Conversational AI
Where did it all go wrong? A hierarchical look into multi-agent error attribution

Adi Banerjee, Anirudh Nair, Tarik Borogovac

NeurIPS 2025 Workshop on Evaluating the Evolving LLM Lifecycle

2025

Error attribution in Large Language Model (LLM) multi-agent systems presents a significant challenge in debugging and improving collaborative AI systems. Current approaches to pinpointing agent and step level failures in multi-agent interaction traces—whether using all-at-once evaluation, step-by-step analysis, or binary search—fall short when analyzing complex patterns, struggling with both accuracy and

Conversational AI
Structuring the unstructured: A multi-agent LLM framework for transforming ambiguous SOPs into code

Sachin Kumar Giroh, Pushpendu Ghosh, Aryan Jain, Harshal Paunikar, Anish Nediyanchath, Aditi Rastogi, Promod Yenigalla

EMNLP 2025

2025

This paper introduces, a three-stage multi agent LLM framework designed to transform unstructured and ambiguous Standard Operating Procedure (SOP) into a structured plan and an executable code template. Unstructured SOPs—common across industries such as finance, retail, and logistics—frequently suffer from ambiguity, missing information, and inconsistency, all of which hinder automation. We address this

Conversational AI

Amazon Web Services releases two new Titan vision-language models

Larry Hardesty

December 20, 2023

Novel architectures and carefully prepared training data enable state-of-the-art performance.

Computer vision
Amazon and MIT announce Science Hub 2023 gift project awards and fellowships

Staff writer

December 19, 2023

Four professors awarded for research in machine learning and robotics; two doctoral candidates awarded fellowships.

Conversational AI
Writing Alexa’s next chapter by combining engineering and science

Staff writer

December 11, 2023

Amazon senior principal engineer Luu Tran is helping the Alexa team innovate by collaborating closely with scientist colleagues.

Conversational AI
Continual learning in the federated-learning context

Jimit Majmudar, Charith Peris

December 7, 2023

Using gradient diversity to optimize selection of past samples for retention improves performance while combatting catastrophic forgetting.

Machine learning
A quick guide to Amazon's 40+ papers at EMNLP 2023

Staff writer

December 6, 2023

Research on natural-language understanding seeks to harness the power of large language models, while query reformulation and text summarization emerge as topics of particular interest.

Conversational AI
Source: New York Times

Responsible AI in the wild: Lessons learned at AWS

Michael Kearns, Aaron Roth

November 16, 2023

Real-world deployment requires notions of fairness that are task relevant and responsive to the available data, recognition of unforeseen variation in the “last mile” of AI delivery, and collaboration with AI activists.

Machine learning

Conversational AI

Publications

Related content

Work with us