Customer-obsessed science
Research areas
-
March 20, 202615 min readSimplifying and clarifying the assembly code for core operations enabled automated optimization and verification.
-
March 19, 202611 min read
-
February 17, 20263 min read
-
-
January 13, 20267 min read
Featured news
-
BIG.AI@MIT2026Large language models (LLMs) are increasingly deployed in real-world applications such as chatbots, writing assistants, and text summarization tools. As these applications become more central to user-facing tasks, robust evaluation of their performance becomes critical, not only for ensuring quality but also for guiding continuous improvement. Traditional evaluation approaches rely on intrinsic metrics
-
AAAI 2026 Workshop on Shaping Responsible Synthetic Data in the Era of Foundation Models2026Product information extraction is crucial for e-commerce services, but obtaining high-quality labeled datasets remains challenging. We present a systematic approach for generating synthetic e-commerce product data using Large Language Models (LLMs), introducing a controlled modification framework with three strategies: attribute-preserving modification, controlled negative example generation, and systematic
-
2026Large Language Model (LLM) judges exhibit strong reasoning capabilities but are limited to textual content. This leaves current automatic Speech-to-Speech (S2S) evaluation methods reliant on opaque and expensive Audio Language Models (ALMs). In this work, we propose TRACE (Textual Reasoning over Audio Cues for Evaluation), a novel framework that enables LLM judges to reason over audio cues to achieve cost-efficient
-
EACL 2026, NeurIPS 2025 Workshop on Continual and Compatible Foundation Model Updates2026Large Language Models (LLMs) suffer from severe catastrophic forgetting when adapted sequentially to new tasks in a continual learning (CL) setting. Existing approaches are fundamentally limited: replay-based methods are impractical and could potentially violate privacy, while strict orthogonality-based methods collapse under scale: each new task is projected onto an orthogonal complement, progressively
-
2026Reinforcement learning (RL) has re-emerged as a natural approach for training interactive LLM agents in real-world environments. However, directly applying the widely used Group Relative Policy Optimization (GRPO) algorithm to multi-turn tasks exposes notable limitations, particularly in scenarios requiring long-horizon reasoning. To address these challenges, we investigate more stable and effective advantage
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all