CPR: Collaborative pairwise ranking for online list recommendations

Saurabh Gupta; Anna Luo; Bharathan Balaji

Publication

CPR: Collaborative pairwise ranking for online list recommendations

By Saurabh Gupta, Anna Luo, Bharathan Balaji

2020

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Classical approaches to recommendation systems like collaborative filtering learn a static model given the user historic interaction data. These approaches do not perform well in dynamic environments where the sets of users and items are continually changing. Users convey their preferences implicitly by providing feedback in the form of clicks, views and ratings, as they interact with the system. Utilizing this feedback in an online manner is crucial for building a good user experience. Contextual bandit algorithms provide a suitable framework for learning user preferences online by balancing the explore-exploit trade-off. Much of the bandit literature focuses on choosing one item; we extend these algorithms to recommend a list of actions by assuming a cascade click model. We provide an empirical study across different scenarios to showcase the benefits of collaborative online learning and exploration. Finally, we propose a novel algorithm, Collaborative Pairwise Ranking (CPR), which uses pairwise differentiable gradient descent to perform online ranking collaboratively. We showcase that this approach outperforms state-of-the-art collaborative bandit approaches, especially in the presence of noisy feedback common in practical scenarios.

CPR: Collaborative pairwise ranking for online list recommendations

Latest news

Work with us