Customer-obsessed science
Research areas
-
June 8, 20267 min readFour approaches can dramatically improve the performance and trustworthiness of AI agents in operational environments.
-
-
-
-
May 27, 20264 min readMachine learning
Featured news
-
2026Multi-agent systems (MAS) are increasingly capable of tackling complex real-world tasks, yet their reliance on inter-agent coordination, tool use, and long-horizon reasoning makes error recognition particularly challenging. Minor errors can propagate across agents, escalating into task failures while producing long, intertwined execution trajectories that impose significant costs for both human developers
-
IEEE Micro2026Despite the nonnegligible occurrence of silent data corruption (SDC) during largescale training of large language models (LLMs), SDC impact on training lacks systematic understanding. This article empirically analyzes the connections between different training characteristics and the impact of SDC on LLM training. Using deterministic training workloads on real-world SDC-affected hardware, we quantify SDC
-
2026Large language model (LLM)-based agents increasingly rely on tool use to complete real-world tasks. While existing works evaluate the LLMs' tool use capability, they largely focus on the final answers yet overlook the detailed tool usage trajectory, i.e., whether tools are selected, parameterized, and ordered correctly. We introduce TRAJECT-Bench, a trajectory-aware benchmark to comprehensively evaluate
-
ICML 2026 Workshop on Reinforcement Learning from World Feedback2026Training-free verbal reinforcement learning enables LLM agents to learn from world feedback—objective signals such as dynamic task outcomes, market returns, or demand forecasts—by extracting verbal rules from experience and injecting them as context, updating the agent's behavior without parameter changes. However, in non-stationary environments these agents face a retention-forgetting dilemma: retaining
-
2026Existing deepfake detection techniques struggle to keep up with the ever-evolving novel, unseen forgeries methods. This limitation stems from their reliance on statistical artifacts learned during training, which are often tied to specific generation processes that may not be representative of samples from new, unseen deepfake generation methods encountered at test time. We propose that incorporating language
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all