Cross-unit spillovers in A/B testing: Empirical evidence from ads
Randomized Control Trials (RCTs) are widely used across Amazon to causally estimate impacts of proposed feature changes, in order to make data-driven launch decisions. A key element of experimental design is the level of randomization, and the choice often relies on the cross-unit interaction structure. For instance, in the context of advertiser experiments, a treatment may affect the outcome of control advertisers who compete with treated advertisers in ad auctions. Such spillover effects lead to biased treatment effect estimates under simple advertiser randomization due to the violation of the required Stable Unit Treatment Value Assumption (SUTVA). While grouping similar advertisers into clusters and performing cluster randomization potentially mitigates this bias, this comes at the cost of reduced statistical power. Quantifying the magnitude of intra-cluster spillovers is critical to evaluating the trade-offs between simple unit randomization and cluster randomization, and make informed decisions on the level of randomization when spillover effects are a concern. This paper proposes an empirical approach to estimate spillover effects within advertiser clusters (i.e. a group of advertisers who often interact with each other), using a historical experiment randomized on advertiser-id, joined with advertiser cluster data. The general idea is to investigate whether advertiser outcomes vary by the fraction of advertisers being treated in their clusters, i.e. treatment intensity. Specifically, we compute each advertiser’s treatment intensity by overlaying clusters which were generated prior to the experiment. We then assess whether an individual advertiser’s outcomes are impacted by their within-cluster peer’s treatment status, to quantify the degree of spillovers between advertisers that are grouped together in the same cluster. Although we focus on the use case of advertiser-facing experiments, the approach can more broadly be used to assess the importance of spillovers in other experimental settings where we may expect cross-unit interference (e.g. seller, regional, product-level experiments).