Customer-obsessed science
Research areas
-
June 8, 20267 min readFour approaches can dramatically improve the performance and trustworthiness of AI agents in operational environments.
-
-
-
-
May 27, 20264 min readMachine learning
Featured news
-
2025For music streaming services expanding into audiobooks, cold-start personalization presents a critical challenge: as audiobooks are a newly introduced content type, the vast majority of existing users have no audiobook listening history. This domain-level cold-start scenario differs from traditional item or user cold-start scenarios, since personalization must begin before any behavioral data exists in
-
2025Some text generation tasks, such as Attribute Value Extraction (AVE), require decoding multiple independent sequences from the same document context. While standard autoregressive decoding is slow due to its sequential nature, the independence between output sequences offers an opportunity for parallelism. We present Hyper-Parallel Decoding, a novel decoding algorithm that accelerates offline decoding by
-
2025Large Language Models (LLMs) have demonstrated exceptional performance in natural language processing tasks, yet their massive size makes serving them inefficient and costly. Semistructured pruning has emerged as an effective method for model acceleration, but existing approaches are suboptimal because they focus on local, layer-wise optimizations using heuristic rules, failing to leverage global feedback
-
FAIM 20252025Conveyors play a crucial role in transporting packages and containers in manufacturing and production facilities. While computer vision has emerged as a promising technology for real-time monitoring of transportation systems, its application in conveyor operations remains in the early stages. This paper introduces an Industrial Internet of Things (IIoT) framework for real-time conveyor monitoring. We first
-
AISTATS 2025, NeurIPS 2025 Workshop on Efficient Reasoning2025Speculative decoding is an effective technique for accelerating large language model (LLM) inference by drafting multiple tokens in parallel. However, its practical speedup is often limited by a rigid verification step, which strictly enforces that the accepted token distribution exactly matches that of the target model. This constraint leads to the rejection of many plausible tokens, reducing the acceptance
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all