Customer-obsessed science
Research areas
-
May 15, 20265 min readA new scaling law that relates particular architectural choices to loss helps identify models that improve throughput by up to 47% with no loss of accuracy.
-
May 14, 202616 min read
-
-
April 15, 20268 min read
Featured news
-
2026Multi-Agent Debate (MAD) frameworks improve factual reliability in large language models (LLMs) by allowing agents to critique and refine one another's reasoning. Yet, existing MAD systems are computationally expensive and prone to degradation under prolonged debates due to redundant exchanges and unstable judging. We propose a lightweight, industry-deployable alternative that unifies Selective Debate Initiation
-
ICLR 2026 Workshop on Lifelong Agents2026For large language models deployed through black-box APIs, recurring inference costs often dominate one-time training costs, motivating composed agentic systems that amortize expensive reasoning into reusable intermediate representations. We study a broad class of such systems, termed Guide–Core Policies (GCOP), in which a guide model generates a structured strategy that is executed by a black-box core
-
2026Video Large Language Models (VideoLLMs) excel at video understanding tasks where outputs are textual, such as Video Question Answering and Video Captioning. However, they underperform specialized embedding-based models in Retrieval tasks, such as Text-to-Video Retrieval and Moment Retrieval. We introduce ViLL-E (Video-LLM-Embed), a unified VideoLLM architecture endowed with a novel embedding generation
-
ICASSP 20262026Streaming automatic speech recognition (ASR) systems based on Large Language Models (LLMs) face a fundamental trade-off between accuracy and latency. Existing approaches typically employ fixed-size chunking to maintain low latency, which often compromises recognition accuracy. We propose SCALE, a streaming ASR framework that addresses this challenge through three key techniques: (a) dynamic chunk boundary
-
AISTATS 20262026A/B tests in online experiments face statistical power challenges when testing multiple candidates simultaneously, while adaptive experimental designs (AED) alone fall short in inferring experiment statistics such as the average treatment effect, especially with many metrics (e.g., revenue, safety) and heterogeneous variances. This paper proposes a fixed-budget multi-metric AED framework with a two-phase
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all