Customer-obsessed science
Research areas
-
April 27, 20264 min readA new framework provides a statistical method for estimating the likelihood of catastrophic failures in large language models in adversarial conversations.
-
April 15, 20268 min read
-
April 7, 202613 min read
-
April 1, 20265 min read
Featured news
-
KDD 2025 Workshop on AI Agent for Information Retrieval2025In this paper, we present CACHE-ED, a novel framework for document entity extraction that combines the power of large language models (LLMs) with graph-based document representations, caching mechanisms, and an actor-critic multi-agent architecture. Our approach addresses the inefficiencies and inaccuracies that are common in extracting structured information from documents, particularly in templated formats
-
Machine Learning for Healthcare 20252025Large language models demonstrate impressive performance on standardized healthcare benchmarks, yet their deployment readiness for real-world environments remains poorly understood. Current medical benchmarks present idealized scenarios that misrepresent the complexity of actual clinical data. We systematically evaluate LLM robustness by introducing clinician-validated perturbations to MedQA that mirror
-
Winter Simulation Conference 20252025The integration of Computer-Aided Design (CAD) models into discrete event simulation software is a critical requirement for many simulation projects, particularly those involving the movement of people or vehicles where spatial accuracy directly impacts study outcomes. While importing CAD files and configuring simulation elements is essential for system accuracy, this process is typically time-consuming
-
Journal of the Royal Statistical Society, Series B2025Completely randomized experiments, originally developed by Fisher and Neyman in the 1930s, are still widely used in practice, even in online experimentation. However, such designs are of limited value for answering standard questions in marketplaces, where multiple populations of agents interact strategically, leading to complex patterns of spillover effects. In this paper, we derive the finite-sample properties
-
NeurIPS 2025 Workshop on Structured Probabilistic Inference & Generative Modeling2025Large Language Models (LLMs) are increasingly deployed for structured data generation, yet output consistency remains critical for production applications. We introduce a comprehensive framework for evaluating and improving consistency in LLM-generated structured outputs. Our approach combines: (1) STED (Semantic Tree Edit Distance), a novel similarity metric balancing semantic flexibility with structural
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all