Customer-obsessed science


Research areas
-
September 2, 2025Audible's ML algorithms connect users directly to relevant titles, reducing the number of purchase steps for millions of daily users.
-
-
Featured news
-
RecSys 20252025Audio streaming services, on both voice assistants and in visual apps, often field requests such as 'play more like Foo Fighters.' The service then returns a sequence of tracks that is both relevant to the request and personalized to the requester. While it is natural to evaluate the policies that produce these sequences in terms of customer engagement, such metrics do not assess their performance on other
-
2025Behavioral therapy notes are important for both legal compliance and patient care. Unlike progress notes in physical health, quality standards for behavioral therapy notes remain underdeveloped. To address this gap, we collaborated with licensed therapists to design a comprehensive rubric for evaluating therapy notes across key dimensions: completeness, conciseness, and faithfulness. Further, we extend
-
2025We present a novel large-scale dataset for defect detection in a logistics setting. Recent work on industrial anomaly detection has primarily focused on manufacturing scenarios with highly controlled poses and a limited number of object categories. Existing benchmarks like MVTec-AD [6] and VisA [33] have reached saturation, with state-of-the-art methods achieving up to 99.9% AUROC scores. In contrast to
-
ACL 2025 Workshop on Research on Agent Language Models2025Developing language model-based dialogue agents requires effective data to train models that can follow specific task logic. However, most existing data simulation methods focus on increasing diversity in language, topics, or dialogue acts at the utterance level, largely neglecting a critical aspect of task logic diversity at the dialogue level. This paper proposes a novel data simulation method designed
-
Evaluating long-form AI-generated content remains challenging due to the lack of standardized methodologies that robustly align with human judgment across formats such as articles, blogs, and essays. We introduce HALF-Eval, a scalable framework that combines structured, checklist-based evaluation with machine learning aggregation to assess key quality dimensions, including creativity, impact, coherence
Conferences
Academia
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all