Customer-obsessed science
Research areas
-
June 3, 20264 min readAutomatically fact-checking long, AI-generated research reports poses new challenges — including benchmarking.
-
May 26, 20265 min read
-
-
May 14, 202616 min read
Featured news
-
KDD 20262026With the rapid advancement of powerful large language models (LLMs) in recent years, a wide range of software engineering tasks can now be addressed using LLMs, significantly enhancing productivity and scalability. Numerous benchmark datasets have been developed to evaluate the coding capabilities of these models, while they primarily focus on code generation and issue-resolution tasks. In contrast, we
-
We present ReSuMe, a general framework for mutual enhancement of dense retrieval systems and document summarizers through reinforcement learning. The framework jointly optimizes a language model for generating retrieval-oriented summaries and adapts the retrieval model to these summaries through alternating fine-tuning phases. We employ Group Relative Policy Optimization (GRPO) to fine-tune the language
-
IEEE SusTech 20262026In this paper, we present a comprehensive system-level approach to advancing device sustainability through power optimization for smart home devices, with a detailed case study of Amazon's Echo Pop. Through Lifecycle Assessment (LCA), we identified that Echo Pop generates an estimated 42 kg CO2e over its product lifetime, with 24 kg CO2e (57%) attributed to use-phase emissions, highlighting the critical
-
2026While Large Language Models excel at reasoning and language understanding, they struggle with multi-step operational workflows requiring precise procedural adherence, which is fundamental for industrial automation. Existing SOP-guided agents assume well-defined procedures and structured APIs, failing to address enterprise realities like incomplete SOPs, dynamic web interfaces, and unpredictable document
-
2026Activation outliers in large-scale transformer models pose a fundamental challenge to model quantization, creating excessively large ranges that cause severe accuracy drops during quantization. We empirically observe that outlier severity intensifies with pre-training scale (e.g., progressing from CLIP to the more extensively trained SigLIP and SigLIP2). Through theoretical analysis as well as empirical
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all