Customer-obsessed science
Research areas
-
April 27, 20264 min readA new framework provides a statistical method for estimating the likelihood of catastrophic failures in large language models in adversarial conversations.
-
April 15, 20268 min read
-
April 7, 202613 min read
-
April 1, 20265 min read
Featured news
-
2026We present a systematic method for pruning edges from causal graphs by leveraging tiered knowledge. We characterize conditions under which edges can be removed from a causal graph while preserving the identifiability of (conditional) causal effects. This result enables causal identification on simplified graphs that are substantially smaller than the original graphs. The approach is particularly valuable
-
2026Gradient orthogonalization is a simple strategy that shows great utility in speeding up gradient descent. The Muon optimizer (Jordan et al., 2024b) combines gradient orthogonalization with first-order momentum and achieves significant improvement in data efficiency over Adam/AdamW (Loshchilov & Hutter, 2019a) for language model training. However, when using model parallelism, gradient orthogonalization
-
2026Scaling the number of parameters and the size of training data has proven to be an effective strategy for improving large language model (LLM) performance. Yet, as these models grow increasingly powerful and widely deployed, the cost of inference has become a pressing concern. Despite its importance, the tradeoff between model accuracy and inference efficiency remains underexplored. In this work, we examine
-
ICLR 2026 Workshop on Advances in Financial AI2026Detecting product price outliers is important for retail and e-commerce stores as erroneous or unexpectedly high prices adversely affect competitiveness, revenue, and consumer trust. Classical techniques offer simple thresholds while ignoring the rich semantic relationships among product attributes. We propose an agentic Large Language Model (LLM) framework that treats outlier price flagging as a reasoning
-
ICLR 2026 Workshop on Catch, Adapt, and Operate: Monitoring ML Models Under Drift2026Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising paradigm for post-training reasoning models. However, group-based methods such as Group Relative Policy Optimization (GRPO) face a critical dilemma in sparse-reward settings: pure Reinforcement Learning (RL) suffers from advantage collapse and high-variance gradient estimation, while mixed-policy optimization introduces persistent
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all