Optimizing duration of online experiments via Bayesian early termination
2025
Determining appropriate experimental duration remains a challenging problem in online experimentation. While experimenters ideally would know in advance how long to run experiments in order to inform confident business decisions, many factors affecting conclusiveness of their results are difficult to predict prior to the experiment. Consequently, experimentation services develop 'in-flight' tools that suggest early termination based on observed data. We present a novel optimization framework to guide duration and early termination of online experiments, in particular with the goal of detecting with confidence whether an online experiment can be interrupted earlier than planned on the basis of partial evidence gathered from early results. We formulate a general optimization problem yielding an algorithm recommending early termination of ongoing online experiments when certain conditions of partial 'early' results, obtained from collecting evidence from a shorter duration than originally planned, are met. In our algorithm, the specific parameters determining whether the experiment at hand can be cut short are functions of both experiment metadata, desired launch criteria specified by the experimenter, and past historical results. Using production data from real online experiments, our metadata-aware approach demonstrates that we can achieve high rates of early termination — up to more than 10% of total experiments run — while maintaining low error rates. Here, errors represent cases where partial results from early termination suggest outcomes that differ from what would be observed if the experiment were run to completion, and our approach successfully minimizes such misaligned recommendations. Our approach reveals intuitive patterns: high-stringency experiments use conservative thresholds while lenient criteria allow aggressive early decisions. This work demonstrates that principled metadata-aware optimization can dramatically improve early termination systems while maintaining statistical rigor.
Research areas