Unbiased counterfactual estimation of ranking metrics
We propose a novel method to estimate metrics for a ranking policy, based on behavioral signal data (e.g. clicks or viewing of video contents) generated by a second different policy. Building on , we prove the counterfactual estimator is unbiased, and discuss its low-variance property. The estimator can be used to evaluate ranking model performance offline, to validate and selection positional bias models, and to serve as learning objectives when training new models.