Customer-obsessed science
Research areas
-
April 27, 20264 min readA new framework provides a statistical method for estimating the likelihood of catastrophic failures in large language models in adversarial conversations.
-
April 15, 20268 min read
-
April 7, 202613 min read
-
April 1, 20265 min read
Featured news
-
2025Since the seminal work of TabPFN, research on tabular foundation models (TFMs) based on in-context learning (ICL) has challenged long-standing paradigms in machine learning. Without seeing any real-world data, models pretrained on purely synthetic datasets generalize remarkably well across diverse datasets, often using only a moderate number of in-context examples. This shifts the focus in tabular machine
-
NeurIPS 2025 Workshop on Multimodal Algorithmic Reasoning2025Large Language Models (LLMs) perform well on short-horizon tasks but struggle with long-horizon, multimodal scenarios that require multi-step reasoning, perception, and adaptive planning. We identify two key challenges in these settings: the difficulty of long-term coordination between planning and execution within single-agent architectures and the inefficiency of indiscriminate visual grounding. To address
-
2025This paper investigates synthetic data generation strategies in developing generative retrieval models for domain-specific corpora, thereby addressing the scalability challenges inherent in manually annotating in-domain queries. We study the data strategies for a two-stage training framework: in the first stage, which focuses on learning to decode document identifiers from queries, we investigate LLM-generated
-
KDD 2025 Workshop on Prompt Optimization2025Length control in Large Language Models (LLMs) is a crucial but under-addressed challenge, with applications ranging from voice interfaces requiring concise responses to research summaries needing comprehensive outputs. Current approaches to length control, including Regularized DPO, Length-Instruction Fine-Tuning, and tool-augmented methods, typically require expensive model retrain-ing or complex inference-time
-
NeurIPS 2025 Workshop on Uncovering Causality in Science2025Online randomized controlled experiments (A/B tests) measure causal changes in industry. While these experiments use incremental changes to minimize disruption, they often yield statistically insignificant results due to low signal-to-noise ratios. Precision improvement (or reducing standard error) traditionally focuses on trigger observations - where treatment and control outputs differ. Though effective
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all