Customer-obsessed science
Research areas
-
March 20, 202615 min readSimplifying and clarifying the assembly code for core operations enabled automated optimization and verification.
-
March 19, 202611 min read
-
February 25, 202611 min read
-
February 17, 20263 min read
-
Featured news
-
2026We present a systematic method for pruning edges from causal graphs by leveraging tiered knowledge. We characterize conditions under which edges can be removed from a causal graph while preserving the identifiability of (conditional) causal effects. This result enables causal identification on simplified graphs that are substantially smaller than the original graphs. The approach is particularly valuable
-
2026Gradient orthogonalization is a simple strategy that shows great utility in speeding up gradient descent. The Muon optimizer (Jordan et al., 2024b) combines gradient orthogonalization with first-order momentum and achieves significant improvement in data efficiency over Adam/AdamW (Loshchilov & Hutter, 2019a) for language model training. However, when using model parallelism, gradient orthogonalization
-
2026Scaling the number of parameters and the size of training data has proven to be an effective strategy for improving large language model (LLM) performance. Yet, as these models grow increasingly powerful and widely deployed, the cost of inference has become a pressing concern. Despite its importance, the tradeoff between model accuracy and inference efficiency remains underexplored. In this work, we examine
-
ICLR 2026 Workshop on Advances in Financial AI2026Detecting product price outliers is important for retail and e-commerce stores as erroneous or unexpectedly high prices adversely affect competitiveness, revenue, and consumer trust. Classical techniques offer simple thresholds while ignoring the rich semantic relationships among product attributes. We propose an agentic Large Language Model (LLM) framework that treats outlier price flagging as a reasoning
-
2026Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising paradigm for post-training reasoning models. However, group-based methods such as Group Relative Policy Optimization (GRPO) face a critical dilemma in sparse-reward settings: pure Reinforcement Learning (RL) suffers from advantage collapse and high-variance gradient estimation, while mixed-policy optimization introduces persistent
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all