Customer-obsessed science
Research areas
-
April 27, 20264 min readA new framework provides a statistical method for estimating the likelihood of catastrophic failures in large language models in adversarial conversations.
-
April 15, 20268 min read
-
April 7, 202613 min read
-
April 1, 20265 min read
Featured news
-
2025In real-world NLP applications, Large Language Models (LLMs) offer promising solutions due to their extensive training on vast datasets. However, the large size and high computation demands of LLMs limit their practicality in many applications, especially when further fine-tuning is required. To address these limitations, smaller models are typically preferred for deployment. However, their training is
-
2025For music streaming services expanding into audiobooks, cold-start personalization presents a critical challenge: as audiobooks are a newly introduced content type, the vast majority of existing users have no audiobook listening history. This domain-level cold-start scenario differs from traditional item or user cold-start scenarios, since personalization must begin before any behavioral data exists in
-
2025Large Language Models (LLMs) have demonstrated exceptional performance in natural language processing tasks, yet their massive size makes serving them inefficient and costly. Semistructured pruning has emerged as an effective method for model acceleration, but existing approaches are suboptimal because they focus on local, layer-wise optimizations using heuristic rules, failing to leverage global feedback
-
FAIM 20252025Conveyors play a crucial role in transporting packages and containers in manufacturing and production facilities. While computer vision has emerged as a promising technology for real-time monitoring of transportation systems, its application in conveyor operations remains in the early stages. This paper introduces an Industrial Internet of Things (IIoT) framework for real-time conveyor monitoring. We first
-
AISTATS 2025, NeurIPS 2025 Workshop on Efficient Reasoning2025Speculative decoding is an effective technique for accelerating large language model (LLM) inference by drafting multiple tokens in parallel. However, its practical speedup is often limited by a rigid verification step, which strictly enforces that the accepted token distribution exactly matches that of the target model. This constraint leads to the rejection of many plausible tokens, reducing the acceptance
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all