-
Code@MIT 20252025In A/B testing, statistical power depends on both the variance of estimated impacts and the distribution of true impacts. A low variance metric can have low power if true impacts on the metric tend to be small, while a high variance metric can have high power if true impacts on the metric tend to be large. Traditional power calculations, however, focus solely on the variance of estimated impacts. They compute
-
Code@MIT 20252025User-randomized A/B testing, while the gold standard for online experimentation, faces significant limitations when legal, ethical, or practical considerations prevent its use. Item-level randomization offers an alternative but typically suffers from high variance and low statistical power due to skewed distributions and limited sample sizes. We here introduce Regular Balanced Switchback Designs (RBSDs)
-
Code@MIT 20252025This paper examines the effectiveness of stratification in experimental design using evidence from multiple large-scale experiments. We analyze data from experiments ranging from approximately 30,000 to 180,000 units across different business contexts. Our results show that pre-stratification and post-stratification achieve virtually identical precision improvements - largest in smaller samples (10% improvement
-
Code@MIT 20252025Determining appropriate experimental duration remains a challenging problem in online experimentation. While experimenters ideally would know in advance how long to run experiments in order to inform confident business decisions, many factors affecting conclusiveness of their results are difficult to predict prior to the experiment. Consequently, experimentation services develop 'in-flight' tools that suggest
-
KDD 2025 Workshop on AI for Supply Chain2025Effective attribution of causes to outcomes is crucial for optimizing complex supply chain operations. Traditional methods, often relying on waterfall logic or correlational analysis, frequently fall short in identifying the true drivers of performance issues. This paper proposes a comprehensive framework leveraging data-driven causal discovery to construct and validate Structural Causal Models (SCMs).
Related content
-
July 29, 2025New cost-to-serve-software metric that accounts for the full software development lifecycle helps determine which software development innovations provide quantifiable value.
-
May 21, 2025By combining surveys with ads targeted to metro and commuter rail lines, Amazon researchers identify the fraction of residents of different neighborhoods exposed to the ads and measure ad effectiveness.
-
December 16, 2024In a keynote address at the latest Amazon Machine Learning Conference, Amazon academic research consultant, Stanford professor, and recent Nobel laureate Guido Imbens offered insights on the estimation of causal effects in “panel data” settings.
-
October 21, 2024Causal machine learning provides a powerful tool for estimating the effectiveness of Fulfillment by Amazon’s recommendations to selling partners.
-
April 30, 2024Using causal random forests and Bayesian structural time series to extrapolate from sparse data ensures that customers get the most useful information as soon as possible.
-
March 21, 2024The principal economist and his team address unique challenges using techniques at the intersection of microeconomics, statistics, and machine learning.