Customer-obsessed science
Research areas
-
April 27, 20264 min readA new framework provides a statistical method for estimating the likelihood of catastrophic failures in large language models in adversarial conversations.
-
April 15, 20268 min read
-
April 7, 202613 min read
-
April 1, 20265 min read
Featured news
-
ACL 2026 Findings2026Hallucinations in Speech Large Language Models (SpeechLLMs) pose significant risks, yet existing detection methods typically rely on goldstandard outputs that are costly or impractical to obtain. Moreover, hallucination detection methods developed for text-based LLMs do not directly capture audio-specific signals. We investigate four attention-derived metrics: AUDIORATIO, AUDIOCONSISTENCY, AUDIOENTROPY,
-
ICLR 2026 Workshop on Logical Reasoning of Large Language Models2026Finny is a multi-agent system that demonstrates how large language models can perform structured decision-making by applying domain-specific rules to multiple related scenarios. Leveraging foundation models with Retrieval-Augmented Generation (RAG), the system applies Standard Operating Procedures (SOPs) for intelligent forecast refinement at scale. Finny employs a two-stage architecture: a knowledge base
-
2026Aligning Large Language Models (LLM) to address subjectivity and nuanced preference levels requires adequate flexibility and control, which can be a resource-intensive and time-consuming procedure. Existing training-time alignment methods require full re-training when a change is needed and inference-time ones typically require access to the reward model at each inference step. We introduce MEAV, an inference-time
-
ACL 2026 Findings2026The reasoning capabilities of large language models (LLMs) have improved substantially through increased test-time computation, typically in the form of intermediate tokens known as chain-of-thought (CoT). However, CoT often becomes unnecessarily long, increasing computation costs without improving accuracy and sometimes even degrading performance, a phenomenon known as 'overthinking'. We propose a multi-stage
-
2026LLM-based code agents treat repositories as unstructured text, applying edits through brittle string matching that frequently fails due to formatting drift or ambiguous patterns. We propose reframing the codebase as a structured action space where agents operate on named AST entities rather than text spans. Our framework, CODESTRUCT, provides readCode for retrieving complete syntactic units and editCode
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all