Customer-obsessed science


Research areas
-
August 4, 2025Translating from natural to structured language, defining truth, and definitive reasoning remain topics of central concern in automated reasoning, but Amazon Web Services’ new Automated Reasoning checks help address all of them.
Featured news
-
Instruction following is a key capability for LLMs. However, recent studies have shown that LLMs often struggle with instructions containing multiple constraints (e.g. a request to create a social media post “in a funny tone” with “no hashtag”). Despite this, most evaluations focus solely on synthetic data. To address this, we introduce RealInstruct, the first benchmark designed to evaluate LLMs’ ability
-
2024Query Auto-Complete (QAC) is an essential search feature that suggests users with a list of potential search keyword completions as they type, enabling them to complete their queries faster. While the QAC systems in eCommerce stores generally use the Learning to Rank (LTR) approach optimized based on customer feedback, it struggles to provide diverse suggestions, leading to repetitive queries and limited
-
2024Entity matching is the task of linking records from different sources that refer to the same real-world entity. Past work has primarily treated entity linking as a standard supervised learning problem. However, supervised entity matching models often do not generalize well to new data, and collecting exhaustive labeled training data is often cost prohibitive. Further, recent efforts have adopted LLMs for
-
2024Training with mixed data distributions is a common and important part of creating multi-task and instruction-following models. The diversity of the data distributions and cost of joint training makes the optimization procedure extremely challenging. Data mixing methods partially address this problem, albeit having a suboptimal performance across data sources and require multiple expensive training runs.
-
2024Retrieval models are often evaluated on partially-annotated datasets. Each query is mapped to a few relevant texts and the remaining corpus is assumed to be irrelevant. As a result, models that successfully retrieve falsely labeled negatives are punished in evaluation. Unfortunately, completely annotating all texts for every query is not resource efficient. In this work, we show that using partially-annotated
Academia
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all