Customer-obsessed science
Research areas
-
May 14, 202616 min readBy focusing on specific failure points and suggesting targeted solutions, a new automated prompt-engineering framework improves prompt performance without compromising existing functionality.
-
-
April 15, 20268 min read
-
April 7, 202613 min read
Featured news
-
ICLR 2026 Workshop on Lifelong Agents2026For large language models deployed through black-box APIs, recurring inference costs often dominate one-time training costs, motivating composed agentic systems that amortize expensive reasoning into reusable intermediate representations. We study a broad class of such systems, termed Guide–Core Policies (GCOP), in which a guide model generates a structured strategy that is executed by a black-box core
-
2026Video Large Language Models (VideoLLMs) excel at video understanding tasks where outputs are textual, such as Video Question Answering and Video Captioning. However, they underperform specialized embedding-based models in Retrieval tasks, such as Text-to-Video Retrieval and Moment Retrieval. We introduce ViLL-E (Video-LLM-Embed), a unified VideoLLM architecture endowed with a novel embedding generation
-
ICASSP 20262026Streaming automatic speech recognition (ASR) systems based on Large Language Models (LLMs) face a fundamental trade-off between accuracy and latency. Existing approaches typically employ fixed-size chunking to maintain low latency, which often compromises recognition accuracy. We propose SCALE, a streaming ASR framework that addresses this challenge through three key techniques: (a) dynamic chunk boundary
-
AISTATS 20262026A/B tests in online experiments face statistical power challenges when testing multiple candidates simultaneously, while adaptive experimental designs (AED) alone fall short in inferring experiment statistics such as the average treatment effect, especially with many metrics (e.g., revenue, safety) and heterogeneous variances. This paper proposes a fixed-budget multi-metric AED framework with a two-phase
-
2026Large Language Model (LLM)-based Multi-Agent Systems (MAS) enable complex problem-solving but introduce significant debugging challenges, characterized by long interaction traces, inter-agent dependencies, and delayed error manifestation. Existing diagnostic approaches often rely on expensive expert annotation or 'LLM-as-a-judge' paradigms, which struggle to pinpoint decisive error steps within extended
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all