Customer-obsessed science
Research areas
-
June 24, 20265 min readMillimeter-scale particles of nuclear-reactor fuel are encased in four layers of different materials that act as a “miniature containment system”.
-
-
-
-
Featured news
-
2025Large Language Models (LLMs) have demonstrated exceptional performance in natural language processing tasks, yet their massive size makes serving them inefficient and costly. Semistructured pruning has emerged as an effective method for model acceleration, but existing approaches are suboptimal because they focus on local, layer-wise optimizations using heuristic rules, failing to leverage global feedback
-
FAIM 20252025Conveyors play a crucial role in transporting packages and containers in manufacturing and production facilities. While computer vision has emerged as a promising technology for real-time monitoring of transportation systems, its application in conveyor operations remains in the early stages. This paper introduces an Industrial Internet of Things (IIoT) framework for real-time conveyor monitoring. We first
-
AISTATS 2025, NeurIPS 2025 Workshop on Efficient Reasoning2025Speculative decoding is an effective technique for accelerating large language model (LLM) inference by drafting multiple tokens in parallel. However, its practical speedup is often limited by a rigid verification step, which strictly enforces that the accepted token distribution exactly matches that of the target model. This constraint leads to the rejection of many plausible tokens, reducing the acceptance
-
2025Since the seminal work of TabPFN, research on tabular foundation models (TFMs) based on in-context learning (ICL) has challenged long-standing paradigms in machine learning. Without seeing any real-world data, models pretrained on purely synthetic datasets generalize remarkably well across diverse datasets, often using only a moderate number of in-context examples. This shifts the focus in tabular machine
-
ICLR 2025 Workshop on Resource-Adaptive Foundation Model Inference (AdaptFM)2025Multi-model inference systems—whether based on routing, cascading, or unified strategies—often rely on confidence signals to decide when a small language model (SLM) output should be accepted or deferred. While such signals are commonly used in classification and short-form generation, their reliability in structured generation settings remains poorly understood. In this work, we study log-probability confidence
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all