Customer-obsessed science
Research areas
-
November 28, 20254 min readLarge language models are increasing the accuracy, reliability, and consistency of the product catalogue at scale.
-
November 20, 20254 min read
-
October 20, 20254 min read
-
October 14, 20257 min read
-
October 2, 20253 min read
Featured news
-
2025As machine learning (ML) systems are increasingly deployed in high-stakes domains, the need for robust methods to assess fairness has become more critical. While statistical fairness metrics are widely used due to their simplicity, they are limited in their ability to explain why disparities occur, as they rely on associative relationships in the data. In contrast, causal fairness metrics aim to uncover
-
arXiv2025Despite rapid progress in LLM agents, performance on long-horizon, tool-using tasks remains fragile. To better understand this fragility, we ask a simple question: do all actions contribute equally to failure? Analyzing execution traces on τ-Bench (Airline/Retail) and SWE-Bench Verified, we decompose trajectories into mutating (environment-changing) vs. non-mutating steps and formalize de-cisive deviations—earliest
-
NeurIPS 2025 Workshop on Evaluating the Evolving LLM Lifecycle2025Error attribution in Large Language Model (LLM) multi-agent systems presents a significant challenge in debugging and improving collaborative AI systems. Current approaches to pinpointing agent and step level failures in multi-agent interaction traces—whether using all-at-once evaluation, step-by-step analysis, or binary search—fall short when analyzing complex patterns, struggling with both accuracy and
-
NeurIPS 2025 Workshop on Recent Advances in Time Series Foundation Models2025Many time series applications require access to multi-step forecast trajectories in the form of sample paths. Recently, time series foundation models have leveraged multi-step lookahead predictions to improve the quality and efficiency of multi-step forecasts. However, these models only predict independent marginal distributions for each time step, rather than a full joint predictive distribution. To generate
-
2025Music recommendation systems face the dual challenge of capturing both immediate context and long-term preferences in users' listening patterns. We adapt a generalized sequential model architecture for music recommendation, introducing modifications that acknowledge how music preferences combine temporal patterns and stable tastes. By removing causal masking constraints typically used in sequential models
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all