Customer-obsessed science
Research areas
-
January 13, 20267 min readLeveraging existing environment simulators and reward functions based on verifiable ground truth boosts task success rate, even with small models and small training datasets.
-
December 29, 20256 min read
-
December 29, 20259 min read
-
December 8, 20258 min read
-
December 5, 20256 min read
Featured news
-
2025We introduce a novel, training free cascade for auto-prompting Large Language Models (LLMs) to assess product quality in e-commerce. Our system requires no training labels or model fine-tuning, instead automatically generating and refining prompts for evaluating attribute quality across tens of thousands of product category–attribute pairs. Starting from a seed of human-crafted prompts, the cascade progressively
-
NeurIPS 2025 Workshop on Multi-Turn Interactions in Large Language Models2025Simulating step-wise human behavior with Large Language Models (LLMs) has become an emerging research direction, enabling applications in various practical domains. While prior methods, including prompting, supervised fine-tuning (SFT), and reinforcement learning (RL), have shown promise in modeling step-wise behavior, they primarily learn a population-level policy without conditioning on a user's persona
-
2025Recent advancements in large language models (LLMs) have enabled their successful application to a broad range of tasks. However, in information-intensive tasks, the prompt length can grow fast, leading to increased computational requirements, performance degradation, and induced biases from irrelevant or redundant information. Recently, various prompt compression techniques have been introduced to optimize
-
2025Retrieval-Augmented Generation (RAG) has significantly mitigated the hallucinations of Large Language Models (LLMs) by grounding the generation with external knowledge. Recent extensions of RAG to graph-based retrieval offer a promising direction, leveraging the structural knowledge for multi-hop reasoning. However, existing graph RAG typically decouples retrieval and reasoning processes, which prevents
-
NeurIPS 2025 Workshop on Foundations of Reasoning in Language Models2025Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a powerful framework for enhancing the reasoning capabilities of large language models (LLMs). However, existing approaches such as Group Relative Policy Optimization (GRPO) and its variants, while effective on reasoning benchmarks, struggle with agentic tasks that require iterative decision-making and refinement. We introduce MURPHY,
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all