Customer-obsessed science
Research areas
-
June 8, 20267 min readFour approaches can dramatically improve the performance and trustworthiness of AI agents in operational environments.
-
-
-
-
May 27, 20264 min readMachine learning
Featured news
-
KDD 2025 Workshop on Prompt Optimization2025Length control in Large Language Models (LLMs) is a crucial but under-addressed challenge, with applications ranging from voice interfaces requiring concise responses to research summaries needing comprehensive outputs. Current approaches to length control, including Regularized DPO, Length-Instruction Fine-Tuning, and tool-augmented methods, typically require expensive model retrain-ing or complex inference-time
-
NeurIPS 2025 Workshop on Uncovering Causality in Science2025Online randomized controlled experiments (A/B tests) measure causal changes in industry. While these experiments use incremental changes to minimize disruption, they often yield statistically insignificant results due to low signal-to-noise ratios. Precision improvement (or reducing standard error) traditionally focuses on trigger observations - where treatment and control outputs differ. Though effective
-
KDD 2025 Workshop on AI Agent for Information Retrieval2025In this paper, we present CACHE-ED, a novel framework for document entity extraction that combines the power of large language models (LLMs) with graph-based document representations, caching mechanisms, and an actor-critic multi-agent architecture. Our approach addresses the inefficiencies and inaccuracies that are common in extracting structured information from documents, particularly in templated formats
-
NeurIPS 2025 Workshop on Mathematical Reasoning and AI2025We present an approach for training language models to interactively prove theorems using the Lean proof assistant. Our approach enables models to propose partial proofs, receive verification feedback, and iteratively refine their proofs. We develop a synthetic data generation pipeline that converts static proof datasets into multi-turn interactive sequences, complete with incremental verification feedback
-
Machine Learning for Healthcare 20252025Large language models demonstrate impressive performance on standardized healthcare benchmarks, yet their deployment readiness for real-world environments remains poorly understood. Current medical benchmarks present idealized scenarios that misrepresent the complexity of actual clinical data. We systematically evaluate LLM robustness by introducing clinician-validated perturbations to MedQA that mirror
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all