-
NeurIPS 2025 Workshop on Multi-Turn Interactions in Large Language Models2025Agentic tool use has gained traction with the rise of agentic tool calling, yet most existing work overlooks the complexity of multi-turn tool interactions. We introduce OrchDAG, a synthetic data generation pipeline that models tool execution as directed acyclic graphs (DAGs) with controllable complexity. Using this dataset, we benchmark model performance and propose a graph-based reward to enhance RLVR
-
2025Text-to-SQL systems translate natural language (NL) questions into SQL queries, enabling non-technical users to interact with structured data. While large language models (LLMs) have shown promising results on the text-to-SQL task, they often produce semantically incorrect yet syntactically valid queries, with limited insight into their reliability. We propose SQLENS, an end-to-end framework for fine-grained
-
arXiv2025Despite rapid progress in LLM agents, performance on long-horizon, tool-using tasks remains fragile. To better understand this fragility, we ask a simple question: do all actions contribute equally to failure? Analyzing execution traces on τ-Bench (Airline/Retail) and SWE-Bench Verified, we decompose trajectories into mutating (environment-changing) vs. non-mutating steps and formalize de-cisive deviations—earliest
-
NeurIPS 2025 Workshop on Evaluating the Evolving LLM Lifecycle2025Error attribution in Large Language Model (LLM) multi-agent systems presents a significant challenge in debugging and improving collaborative AI systems. Current approaches to pinpointing agent and step level failures in multi-agent interaction traces—whether using all-at-once evaluation, step-by-step analysis, or binary search—fall short when analyzing complex patterns, struggling with both accuracy and
-
2025This paper introduces, a three-stage multi agent LLM framework designed to transform unstructured and ambiguous Standard Operating Procedure (SOP) into a structured plan and an executable code template. Unstructured SOPs—common across industries such as finance, retail, and logistics—frequently suffer from ambiguity, missing information, and inconsistency, all of which hinder automation. We address this
Related content
-
December 20, 2023Novel architectures and carefully prepared training data enable state-of-the-art performance.
-
December 19, 2023Four professors awarded for research in machine learning and robotics; two doctoral candidates awarded fellowships.
-
December 11, 2023Amazon senior principal engineer Luu Tran is helping the Alexa team innovate by collaborating closely with scientist colleagues.
-
December 7, 2023Using gradient diversity to optimize selection of past samples for retention improves performance while combatting catastrophic forgetting.
-
December 6, 2023Research on natural-language understanding seeks to harness the power of large language models, while query reformulation and text summarization emerge as topics of particular interest.
-
Source: New York TimesNovember 16, 2023Real-world deployment requires notions of fairness that are task relevant and responsive to the available data, recognition of unforeseen variation in the “last mile” of AI delivery, and collaboration with AI activists.