Amazon at RecSys: Evaluation, bias, and algorithms
Amazon Scholar Pablo Castells on the trends he sees in recommender-system research.
Some of the conferences where Amazon scientists publish deal with topics — such as computer vision or natural-language processing — that have been subjects of scientific investigation since long before there was any possibility of their practical application.
The ACM Conference on Recommender Systems (RecSys) is not one of those conferences. Recommender systems — like the ones that recommend products to customers in the Amazon Store — are very much a product of the Internet age.
“The participation of industry is very high at RecSys,” says Pablo Castells, an Amazon Scholar, an associate professor of computer science at the Universidad Autónoma de Madrid, and a senior program chair and the doctoral-program chair at this year’s RecSys. “I'd say that it's higher than in other conferences that I usually follow. But I've been an academic for most of my career, and I feel like I've been doing research on recommendation in a very academic way. I think it's possible to abstract away the problem and address mathematical questions and experiment with small data and so on.”
Castells sees three topics as being of particular interest in the field today: evaluation, bias, and algorithms.
“How do we determine if recommender systems are working properly?” he says. “That's a broad question that has its own challenges when it’s addressed online, like in a production system, with A/B tests, et cetera. It's even harder to solve it offline. I think a recent trend in this direction is to consider that the customer for a recommender system is not only the end consumer but also the seller or the service that provides the recommendation. So the effectiveness of an algorithm should be measured from different points of view, considering the different stakeholders involved in the recommendation.”
Even with a specific customer in mind, it’s not always clear how to measure a recommender system’s performance, Castells explains.
“A recommender system is effective not just if it nails what you like but, basically, if what it provides is useful. And what's useful to you depends on the context, and it also depends on the purpose. It’s not the same if, for instance, I want to listen to familiar music that I like or I want to discover music. The value of recommendation is multifaceted and multidimensional. Awareness of that has grown in the last decade or two.
“That's also related to another perspective, which is to realize that the recommender system both tries to make the user happy and needs to learn from the user. When you deliver a set of products to the user, you have two purposes. One is to please the user. Another is to learn more, and with what you learned, you can do better recommendations in the future. You actually have two goals that are not necessarily aligned.”
Bias, Castells says, is inclusive of but not limited to the questions of fairness that have recently become so urgent in AI.
“Bias is a very general issue that has very different angles,” Castells says. “Bias can make your system ineffective because you are recommending the same stuff again and again. It can also mess up your measurements. If you have a bias in your experiments, you can draw the wrong conclusions, so your decisions will be suboptimal. And then bias also has to do with fairness. In an online retail environment, are you fairly recommending enough products from all suppliers? I think that that is one of the top research topics at the moment.
“Typically, fairness problems don't have a single, absolute solution. And I think that what is most important is to improve awareness of bias. If you have a good awareness of bias and unfairness, you may be close to a solution already, even if it's not perfect.”
When it comes to algorithms, “deep learning would seem to be taking over prior leaders and existing algorithms,” Castells says. “There are improvements when data involves rich side information and in specific types of recommendation, like when you recommend things involving sequences. But in more-generic tasks, such as pure collaborative filtering, simpler algorithms are sometimes being found to be on par with deep learning.”
Part of the reason, Castells says, may be that “in the field of recommendation, human behavior can be less predictable and benchmarkable, and there’s value in surprise.” On that score, he says, collaborative filtering — in which recommendations for one customer are based on the past purchases of customers with similar buying profiles — enjoys advantages.
“That allows you to make recommendations that are not so predictable for the user,” Castells explains. “You're not recommending more of the same. If you only use the descriptions of what the products are, or maybe user demographics, you may tend to recommend the same thing to the user. Whereas if you find patterns in-between people, in-between products, then you can reach farther and maybe sometimes surprise. And you can add an explicit extra push beyond this in your algorithm to reach even more novel choices.”
Of course, given the success of deep learning in other fields, it would be a mistake to underestimate its potential in the field of recommendation.
“It's difficult to see a conclusive answer in conference proceedings as to what algorithm is best because much of what you can see published at RecSys is based on public data, and the field still needs to improve on experimental standards, reproducibility, and benchmarking,” Castells explains. “Deep learning in general is known to require massive data to be at its best. In conferences, you most often find experiments with much more limited data, and maybe that's not the type of experiment where deep learning would do the best.
“So my impression is that the question about what algorithmic approach is best is still open. I haven't seen full agreement that ‘Yeah, forget about linear matrix factorization, forget about k-nearest neighbors, you need to do deep learning.’ It may depend on the application, on the data — even on the experiment configuration. Maybe next year this point will be reached, but I haven’t seen it yet.”