Customer-obsessed science
Research areas
-
April 27, 20264 min readA new framework provides a statistical method for estimating the likelihood of catastrophic failures in large language models in adversarial conversations.
-
April 15, 20268 min read
-
April 7, 202613 min read
-
April 1, 20265 min read
Featured news
-
CLeaR 20262026In this paper we show how to exploit interventional data to acquire the joint conditional distribution of all the variables using the Maximum Entropy principle. To this end, we extend the Causal Maximum Entropy method to make use of data arising from identifiable interventional distributions in addition to data from the observational distribution. Using Lagrange duality, we prove that the solution to the
-
ICSE 20262026Large Language Models (LLMs) are increasingly integrated into software systems as automated decision-making components. These systems rely on instruction prompts written in natural language to encode complex workflows. However, debugging these prompts when LLMs produce undesired outputs remains challenging due to their black-box nature and the impracticality of manually inspecting large, complex inputs.
-
CSER 20262026Construction management systems require realistic test data capturing complex stakeholder interactions and temporal dependencies, yet accessing real project data remains challenging due to privacy constraints and proprietary information protection. This research addresses a critical systems engineering challenge by introducing agentic simulacra patterns that leverage multi-agent coordination to generate
-
2026Multi-Agent Debate (MAD) frameworks improve factual reliability in large language models (LLMs) by allowing agents to critique and refine one another's reasoning. Yet, existing MAD systems are computationally expensive and prone to degradation under prolonged debates due to redundant exchanges and unstable judging. We propose a lightweight, industry-deployable alternative that unifies Selective Debate Initiation
-
ICLR 2026 Workshop on Lifelong Agents2026For large language models deployed through black-box APIs, recurring inference costs often dominate one-time training costs, motivating composed agentic systems that amortize expensive reasoning into reusable intermediate representations. We study a broad class of such systems, termed Guide–Core Policies (GCOP), in which a guide model generates a structured strategy that is executed by a black-box core
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all