Customer-obsessed science
Research areas
-
June 8, 20267 min readFour approaches can dramatically improve the performance and trustworthiness of AI agents in operational environments.
-
-
-
-
May 26, 20265 min read
Featured news
-
Machine Learning for Healthcare 20252025Large language models demonstrate impressive performance on standardized healthcare benchmarks, yet their deployment readiness for real-world environments remains poorly understood. Current medical benchmarks present idealized scenarios that misrepresent the complexity of actual clinical data. We systematically evaluate LLM robustness by introducing clinician-validated perturbations to MedQA that mirror
-
Winter Simulation Conference 20252025The integration of Computer-Aided Design (CAD) models into discrete event simulation software is a critical requirement for many simulation projects, particularly those involving the movement of people or vehicles where spatial accuracy directly impacts study outcomes. While importing CAD files and configuring simulation elements is essential for system accuracy, this process is typically time-consuming
-
AACL 20252025Search queries with superlatives (e.g., best, most popular) require comparing candidates across multiple dimensions, demanding linguistic understanding and domain knowledge. We show that LLMs can uncover latent intent behind these expressions in e-commerce queries through a framework that extracts structured interpretations or hints. Our approach decomposes queries into attribute-value hints generated concurrently
-
Journal of the Royal Statistical Society, Series B2025Completely randomized experiments, originally developed by Fisher and Neyman in the 1930s, are still widely used in practice, even in online experimentation. However, such designs are of limited value for answering standard questions in marketplaces, where multiple populations of agents interact strategically, leading to complex patterns of spillover effects. In this paper, we derive the finite-sample properties
-
NeurIPS 2025 Workshop on Structured Probabilistic Inference & Generative Modeling2025Large Language Models (LLMs) are increasingly deployed for structured data generation, yet output consistency remains critical for production applications. We introduce a comprehensive framework for evaluating and improving consistency in LLM-generated structured outputs. Our approach combines: (1) STED (Semantic Tree Edit Distance), a novel similarity metric balancing semantic flexibility with structural
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all