Customer-obsessed science
Research areas
-
April 8, 20266 min readAmazon’s RuleForge system uses agentic AI to generate production-ready detection rules 336% faster than traditional methods.
-
April 7, 202613 min read
-
March 20, 202615 min read
-
March 19, 202611 min read
-
Featured news
-
ACL 2026 Findings2026Recent Long-Context Language Models (LCLMs) can process hundreds of thousands of tokens in a single prompt, enabling new opportunities for knowledge-intensive multi-hop reasoning by integrating large sets of retrieved documents or, in some cases, directly all necessary information. However, simply feeding more documents into the context window fails to capture how evidence should be connected. We address
-
ACL 2026 Findings2026Hallucinations in Speech Large Language Models (SpeechLLMs) pose significant risks, yet existing detection methods typically rely on goldstandard outputs that are costly or impractical to obtain. Moreover, hallucination detection methods developed for text-based LLMs do not directly capture audio-specific signals. We investigate four attention-derived metrics: AUDIORATIO, AUDIOCONSISTENCY, AUDIOENTROPY,
-
ICLR 2026 Workshop on Logical Reasoning of Large Language Models2026Finny is a multi-agent system that demonstrates how large language models can perform structured decision-making by applying domain-specific rules to multiple related scenarios. Leveraging foundation models with Retrieval-Augmented Generation (RAG), the system applies Standard Operating Procedures (SOPs) for intelligent forecast refinement at scale. Finny employs a two-stage architecture: a knowledge base
-
2026Aligning Large Language Models (LLM) to address subjectivity and nuanced preference levels requires adequate flexibility and control, which can be a resource-intensive and time-consuming procedure. Existing training-time alignment methods require full re-training when a change is needed and inference-time ones typically require access to the reward model at each inference step. We introduce MEAV, an inference-time
-
ACL 2026 Findings2026The reasoning capabilities of large language models (LLMs) have improved substantially through increased test-time computation, typically in the form of intermediate tokens known as chain-of-thought (CoT). However, CoT often becomes unnecessarily long, increasing computation costs without improving accuracy and sometimes even degrading performance, a phenomenon known as 'overthinking'. We propose a multi-stage
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all