Customer-obsessed science
Research areas
-
December 5, 20256 min readA multiagent architecture separates data perception, tool knowledge, execution history, and code generation, enabling ML automation that works with messy, real-world inputs.
-
-
-
November 20, 20254 min read
-
Featured news
-
2025High-quality content is critical for driving customer satisfaction and conversions across digital platforms and e-commerce. Ensuring that essential information is complete, accurate, and aligned with customer expectations presents a significant challenge at scale. Existing approaches to content evaluation often treat all information uniformly, without prioritizing based on customer relevance, and rely heavily
-
2025We present MASSIVE-Agents, a new benchmark for assessing multilingual function calling across 52 languages. We created MASSIVE-Agents by cleaning the original MASSIVE dataset and then reformatting it for evaluation within the Berkeley Function-Calling Leaderboard (BFCL) framework. The full benchmark comprises 47,020 samples with an average of 904 samples per language, covering 55 different functions and
-
2025Structured information extraction from unstructured text is critical for emerging Software 3.0 systems where LLM agents autonomously interact with APIs and tools. Recent approaches apply large language models directly to extraction tasks using existing JSON schemas, often with constraint decoding or reinforcement learning approaches to ensure syntactic validity, but treat JSON schemas as static contracts
-
2025For an e-commerce domain, the address is the single most important piece of data for ensuring accurate and reliable deliveries. In this two-part study, we first outline the construction of a language model to assist customers with address standardization and in the latter part, we detail a novel Pareto-ensemble multi-task prediction algorithm that derives critical insights from addresses to minimize operational
-
2025We introduce a novel, training free cascade for auto-prompting Large Language Models (LLMs) to assess product quality in e-commerce. Our system requires no training labels or model fine-tuning, instead automatically generating and refining prompts for evaluating attribute quality across tens of thousands of product category–attribute pairs. Starting from a seed of human-crafted prompts, the cascade progressively
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all