Customer-obsessed science


Research areas
-
June 25, 2025With large datasets, directly generating data ID codes from query embeddings is much more efficient than performing pairwise comparisons between queries and candidate responses.
Featured news
-
2025As large language models (LLMs) become increasingly versatile, numerous large scale benchmarks have been developed to thoroughly assess their capabilities. These benchmarks typically consist of diverse datasets and prompts to evaluate different aspects of LLM performance. However, comprehensive evaluations on hundreds or thousands of prompts incur tremendous costs in terms of computation, money, and time
-
2025In this paper, we study the problem of estimation and learning under temporal distribution shift. Consider an observation sequence of length n, which is a noisy realization of a time-varying ground-truth sequence. Our focus is to develop methods to estimate the ground-truth at the final time-step while providing sharp point-wise estimation error rates. We show that, without prior knowledge on the level
-
2025Can we efficiently choose the best Anomaly Detection (AD) algorithm for a data-stream without requiring anomaly labels? Streaming anomaly detection is hard. SOTA AD algorithms are sensitive to their hyper-parameters and no single method works well on all datasets. The best algorithm/hyper-parameter combination for a given data-stream can change over time with data drift. ‘What is an anomaly?’ is often application
-
2025In Amazon robotic warehouses, the destination-to-chute mapping problem is crucial for efficient package sorting. Often, however, this problem is complicated by uncertain and dynamic package induction rates, which can lead to increased package recirculation. To tackle this challenge, we introduce a Distributionally Robust Multi-Agent Reinforcement Learning (DRMARL) framework that learns a destination-to-chute
-
2025How to best develop foundational models for time series forecasting remains an important open question. Tokenization is a crucial consideration in this effort: what is an effective discrete vocabulary for a real-valued sequential input? To address this question, we develop WaveToken, a wavelet-based tokenizer that allows models to learn complex representations directly in the space of time localized frequencies
Academia
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all