Leveraging transformers to improve product retrieval results

Assessing the absolute utility of query results, rather than just their relative utility, improves learning-to-rank models.

August 3, 2023

3 min read

When a customer clicks on an item in a list of product-search results, it implies that that item is better than those not clicked. “Learning to rank” models leverage such implicit feedback to improve search results, comparing clicked and unclicked results in either “pairwise” (comparing pairs of results) or listwise (judging a results position within the list) fashion.

A problem with this approach is the lack of absolute feedback. For instance, if no items in the selection are clicked, it’s a signal that none of the results was useful. But without clicked items for comparison, learning-to-rank models can do nothing with that information. Similarly, if a customer clicks on all the items in a list, it could indicate that all the results were useful — but it could also indicate a fruitless search to find even one useful result. Again, learning-to-rank models can’t tell the difference.

Absolute feedback

Although the items in the public datasets are scored according to how well they match various search queries, we are chiefly interested in learning from implicit feedback, since that scales much better than learning from labeled data.

Models adapted from information retrieval deal well with noisy GPS input and can leverage map information.

We thus assign each item in our datasets a value of 0 if it isn’t clicked on, a value of 1 if it is clicked on, and a value of 2 if it’s purchased. We define the absolute value of a list of items as the value of its single highest-value member, on the assumption that the purpose of a product query is to identify a single item for purchase. A list with one item that results in a purchase thus has a higher value than a list all of whose items were clicked without purchase.

As input, our transformer model receives information about each product in a list of products, but it also receives a class token. For each input, it generates a vector representation: the representations of the products capture information useful for assessing how well they match a product query, but the representation of the class token captures information about the list as a whole.

The representations pass to a set of scoring heads, which score them according to their relevance to the current query. During training, however, the score of the class token and the product scores are optimized according to separate loss functions.

In addition to product features *(X_I)*, the inputs to our model include a classification token *x_[CLS]*. The transformer outputs *(Z_I and z_[CLS])* pass to scoring heads *(h_s and h_d)*. During training, the product scores *(s(X_I))* and the class token score *(d(X_I)* are optimized according to different loss functions *(L_y and L_t)*.

The transformer’s key design feature is its attention mechanism, which learns how heavily to weight different input features according to context. In our setting, the attention mechanism determines which product features are of particular import given a product query.

Does it help?

Although our results on proprietary data were more impressive, we evaluated our proposal on publicly available datasets so that the research community can verify our results. On the internal data at Amazon Search, where a richer set of features is available, our model achieves better performance than any of the other methods, including strong GBDT models.

On the strength of those results, we are encouraged to keep learning from customer feedback. The user’s perspective is central to ranking problems, and click and purchase data appears to be a signal ripe for further research.

About the Author

Paul Missault

Paul Missault is a senior applied scientist at Amazon.

Leveraging transformers to improve product retrieval results

Assessing the absolute utility of query results, rather than just their relative utility, improves learning-to-rank models.

Absolute feedback

Does it help?

Related content

Work with us