Leveraging covariate adjustments at scale in online A/B testing
Companies offering web services routinely run randomized online experiments to estimate the “causal impact” associated with the adoption of new features and policies on key performance metrics of interest. These experiments are used to estimate a variety of effects: the increase in click rate due to the repositioning of a banner, the impact on subscription rate as a consequence of a discount or special offer, etc. In these settings, even effects whose sizes are very small can have large downstream impacts. The simple difference in means estimator (Splawa-Neyman et al., 1923/1990) is still the standard estimator of choice for many online A/B testing platforms due to its simplicity. This method, however, can fail to detect small effects, even when the experiment contains thousands or millions of observational units. As a byproduct of these experiments, however, large amounts of additional data (covariates) are collected. In this paper, we discuss benefits, costs and risks of allowing experimenters to leverage more complicated estimators that make use of covariates when estimating causal effects of interest. We adapt a recently proposed general-purpose algorithm for the estimation of causal effects with covariates to the setting of online A/B testing. Through this paradigm, we implement several covariate-adjusted causal estimators. We thoroughly evaluate their performance at scale, highlighting benefits and shortcomings of different methods. We show on real experiments how “covariate-adjusted” estimators can (i) lead to more precise quantification of the causal effects of interest and (ii) fix issues related to imbalance across treatment arms — a practical concern often overlooked in the literature. In turn, (iii) these more precise estimates can reduce experimentation time, cutting cost and helping to streamline decision-making processes, allowing for faster adoption of beneficial interventions.