Mitigating targeting bias in content recommendation with causal bandits'
Recommendations systems play a central role in improving customer experience on the Amazon retail website. Commonly, Learning-to-Rank (LTR) methods are employed to rank content, however these methods are subject to bias inherent in the observational data that they use for training. This paper studies a domain-specific self-selection bias, called Content Targeting Bias, introduced when content is generated for specific targeted customers. When content specifically targets classes of customers who are more or less likely to take actions associated with traditional recommendations algorithms (clicks, purchases), the resulting observations reflect a biased relationship between the content and feedback. These observations do not account for the counterfactual condition, or what would have happened if the customer had not received a recommendation. In many cases, customers will have a high propensity of generating rewards, independent of the recommendations shown on the website. In this work we incorporate causal uplift modeling with contextual bandits in order to consider the heterogeneous treatment effect as an adjusted objective for top-k content selection. We demonstrate the performance and impact of the framework through both offline model evaluations and multiple live A/B experiments.