Customer-obsessed science


Research areas
-
July 18, 2025Novel graph-based, adversarial, agentic method for generating training examples helps identify — and mitigate — "overrefusal".
Featured news
-
2024Query Auto-Complete (QAC) is an essential search feature that suggests users with a list of potential search keyword completions as they type, enabling them to complete their queries faster. While the QAC systems in eCommerce stores generally use the Learning to Rank (LTR) approach optimized based on customer feedback, it struggles to provide diverse suggestions, leading to repetitive queries and limited
-
2024Entity matching is the task of linking records from different sources that refer to the same real-world entity. Past work has primarily treated entity linking as a standard supervised learning problem. However, supervised entity matching models often do not generalize well to new data, and collecting exhaustive labeled training data is often cost prohibitive. Further, recent efforts have adopted LLMs for
-
2024Training with mixed data distributions is a common and important part of creating multi-task and instruction-following models. The diversity of the data distributions and cost of joint training makes the optimization procedure extremely challenging. Data mixing methods partially address this problem, albeit having a suboptimal performance across data sources and require multiple expensive training runs.
-
2024Retrieval models are often evaluated on partially-annotated datasets. Each query is mapped to a few relevant texts and the remaining corpus is assumed to be irrelevant. As a result, models that successfully retrieve falsely labeled negatives are punished in evaluation. Unfortunately, completely annotating all texts for every query is not resource efficient. In this work, we show that using partially-annotated
-
2024We propose a constraint learning schema for fine-tuning Large Language Models (LLMs) with attribute control. Given a training corpus and control criteria formulated as a sequencelevel constraint on model outputs, our method fine-tunes the LLM on the training corpus while enhancing constraint satisfaction with minimal impact on its utility and generation quality. Specifically, our approach regularizes the
Academia
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all