Customer-obsessed science
Research areas
-
November 20, 20254 min readA new evaluation pipeline called FiSCo uncovers hidden biases and offers an assessment framework that evolves alongside language models.
-
October 20, 20254 min read
-
October 14, 20257 min read
-
October 2, 20253 min read
-
Featured news
-
Code@MIT 20252025In A/B testing, statistical power depends on both the variance of estimated impacts and the distribution of true impacts. A low variance metric can have low power if true impacts on the metric tend to be small, while a high variance metric can have high power if true impacts on the metric tend to be large. Traditional power calculations, however, focus solely on the variance of estimated impacts. They compute
-
NeurIPS 2025 Workshop on Bridging Language, Agent, and World Models (LAW)2025We present a framework for uncovering and exploiting dependencies among tools and documents to enhance exemplar artifact generation. Our method begins by constructing a tool knowledge graph from tool schemas—including descriptions, arguments, and output payloads—using a DeepResearch-inspired analysis. In parallel, we derive a complementary knowledge graph from internal documents and SOPs, which is then
-
QCE 20252025The rapid evolution of quantum hardware necessitates an adaptable static analysis framework for validating quantum programs. In this work, we introduce SHARP, a rule-based static analysis framework designed for OpenQASM that decouples hardware-specific constraints from the validation engine. By employing a rule-based approach, SHARP allows quantum computing services to validate programs against evolving
-
Code@MIT 20252025User-randomized A/B testing, while the gold standard for online experimentation, faces significant limitations when legal, ethical, or practical considerations prevent its use. Item-level randomization offers an alternative but typically suffers from high variance and low statistical power due to skewed distributions and limited sample sizes. We here introduce Regular Balanced Switchback Designs (RBSDs)
-
Code@MIT 20252025This paper examines the effectiveness of stratification in experimental design using evidence from multiple large-scale experiments. We analyze data from experiments ranging from approximately 30,000 to 180,000 units across different business contexts. Our results show that pre-stratification and post-stratification achieve virtually identical precision improvements - largest in smaller samples (10% improvement
Collaborations
View allWhether you're a faculty member or student, there are number of ways you can engage with Amazon.
View all