External evaluation of ranking models under extreme position-bias
Implicit feedback from users behavior is a natural and scalable source for training and evaluating ranking models in human-interactive systems. However, inherent biases such as the position bias are key obstacles to its effective usage. This is further accentuated in cases of extreme bias, where behavioral feedback can be collected exclusively on the top ranked result. In fact, in such cases, state-of-art debiasing methods cannot be applied. A prominent use case of extreme position bias is the voice shopping medium, where only a small amount of information can be presented to the user during a single interaction, resulting in user behavioral signals that are almost exclusively limited to the top offer. There is no way to know how the user would have reacted to a different offer than the top one he was actually exposed to. Thus, any new ranker we wish to evaluate with respect to a behavioral metric, requires online experimentation. We propose a novel approach, based on an external estimator model, for accurately predicting offline the performance of a new ranker. The accuracy of our solution is proven theoretically, as well as demonstrated by a line of experiments. In these experiments, we focus on the use case of purchase prediction, and show that our estimator can accurately predict offline the purchase rate of different rankers over a segment of voice shopping traffic. Our prediction is validated online, as being compared to the actual performance obtained by each ranker when being exposed to users.