CPR: Collaborative pairwise ranking for online list recommendations
2020
Classical approaches to recommendation systems like collaborative filtering learn a static model given the user historic interaction data. These approaches do not perform well in dynamic environments where the sets of users and items are continually changing. Users convey their preferences implicitly by providing feedback in the form of clicks, views and ratings, as they interact with the system. Utilizing this feedback in an online manner is crucial for building a good user experience. Contextual bandit algorithms provide a suitable framework for learning user preferences online by balancing the explore-exploit trade-off. Much of the bandit literature focuses on choosing one item; we extend these algorithms to recommend a list of actions by assuming a cascade click model. We provide an empirical study across different scenarios to showcase the benefits of collaborative online learning and exploration. Finally, we propose a novel algorithm, Collaborative Pairwise Ranking (CPR), which uses pairwise differentiable gradient descent to perform online ranking collaboratively. We showcase that this approach outperforms state-of-the-art collaborative bandit approaches, especially in the presence of noisy feedback common in practical scenarios.
Research areas