Improving complementary-product recommendations
New modeling approach increases accuracy of recommendations by an average of 7%.
One way that e-commerce sites make life easier for customers is by recommending products that complement whatever the customer is looking for: someone buying a tennis racket, for instance, may also want to buy tennis balls; someone buying a camera may want an SD card for extra storage.
At this year’s Conference on Information and Knowledge Management, my colleagues at the University of California, Los Angeles, and Amazon and I will present a new deep-learning-based method for doing complementary-product recommendation (CPR) that, in our tests, was 7% more likely to find a product that the customer wanted to buy than existing methods.
That improvement comes from three main strategies: better selection of training data for the CPR model; greater diversity in the types of products recommended; and respect for the asymmetry of the CPR problem (while an SD card may a be a good product to complement a camera, a camera is not a good product to complement an SD card).
Our approach also addresses the problem of cold start, or predicting complementary products for items that were added to the product catalogue after the machine learning model was trained. To do that, we use an embedding scheme developed at Amazon, called Product2vec, to represent the inputs to the CPR model — the products we seek to complement — according to their attributes and their relationships with other products, rather than simply using their names or ID numbers.
For training data, our model, like most other CPR models, relies on implicit signals from customers. We consider three ways that product x might be related to product y: co-purchase, meaning customers who purchased 𝑥 also purchased y; co-view, meaning customers who viewed x also viewed y; and purchase after view, meaning customers who viewed x eventually bought y.
CPR models typically use co-views and purchase after view as an indication of similarity and co-purchase as an indication of complementarity. But there is considerable overlap between these three categories.
Our intuition was that training a CPR model on product pairs that show up in the co-purchase data but not in the co-view and purchase-after-view data would lead to better predictions.
User studies in which participants rated pairs of products as substitutable, complementary, or irrelevant bore out this intuition: the complementarity ratings of co-purchase-only product pairs were 30% higher than those of co-purchase product pairs that also showed up in the co-view and purchase-after-view data. Accordingly, we used co-purchase-only product pairs to train our model.
The inputs to our model are Product2vec embedding vectors. Embeddings represent data items as points in a multidimensional space, such that proximity in the space indicates some relationship between the items. In our case, that relationship is similarity: points representing different brands of tennis rackets should cluster together in the space, as should points representing cameras, and so on.
Product2vec differs from other embedding schemes in that its inputs are graphs, data structures consisting of nodes (in our case, the nodes contain product information) and edges connecting the nodes (in our case, the edges represent relationships such as co-purchases and co-views).
In the same way that we train our CPR model on co-purchase-only data, we train Product2vec on pairs of products that show up in the co-view and purchase-after-view data but not in the co-purchase data. The idea is that customers might view variations of the same product before selecting one for purchase, but co-purchased products are likely to be complementary rather than similar.
Product2vec embedding helps solve the cold-start problem, as it will produce a meaningful embedding even for products it hasn’t seen before.
CPR models are typically trained to output the most frequent co-purchases for each input product. But this can lead to homogeneity of outputs: the top three co-purchases for a tennis racket, for instance, might be three different brands of tennis balls. We believe that customers would prefer more-diverse complementary-product recommendations: for instance, the top three recommendations for a tennis racket should be something like a can of tennis balls, a pack of overgrips, and a headband.
We enforce diversity through our model architecture. For every input product, we pass its product-type embedding through a neural network (the type transition network) that outputs the embeddings of complementary product types. Each of those embeddings is then concatenated with the embedding of the input product before passing to the module that generates the recommendations (the type-item prediction module).
The whole model is trained end to end: that is, during training, the type transition network is evaluated solely according to the accuracy of the type-item prediction module’s outputs. But each output of the type transition network is associated with a single output of the type-item prediction module, which naturally leads to greater type diversity among recommendations.
The addition of the type transition network also breaks the symmetry between related products that can cause problems for the typical CPR system. The typical system bases its judgments of complementarity on proximity in the embedding space. But in that space, an SD card is as close to a camera as a camera is to an SD card.
The type transition network, however, learns to output different product-type embeddings for cameras and SD cards, which enables our model to better respond to other, asymmetric signals in the data.
In experiments, we used co-purchase data to compare our model’s performance to that of three leading CPR systems. We scored the models’ recommendations according to the frequency with which their recommended products were co-purchased with the input product.
On two different data sets — electronics and grocery — and three different accuracy measures — the accuracy of the top recommendation, the top three recommendations, and the top ten recommendations — our model outperformed the others across the board.