Using hypergraphs to improve product retrieval
Augmenting query-product graphs with hypergraphs describing product-product relationships improves recall score by more than 48%.
Information retrieval engines like the one that helps Amazon customers find products in the Amazon Store commonly rely on bipartite graphs that map queries to products. The graphs are typically based on customer behaviors: if enough customers executing the same query click the link to a product or buy that product, the graph will include an edge between query and product. A graph neural network (GNN) can then ingest the graph and predict edges corresponding to new queries.
This approach has two drawbacks. One is that most products in the Amazon Store belong to the long tail of items that are rarely searched for, which means that they don’t have enough associated data to make GNN training reliable. Conversely, when handling long-tail queries, a GNN will tend to match them to popular but probably unrelated products, simply because they have a high click and purchase rate overall. This phenomenon is known as disassortative mixing.
In a paper we presented at the ACM Conference on Web Search and Data Mining (WSDM), we address both of these problems by augmenting the bipartite query-product graph with information about which products customers tend to look at during the same online shopping sessions. The idea is that knowing which types of products are related to each other can help the GNN generalize from high-frequency to low-frequency queries.
To capture the information about product relationships, we use a hypergraph, a generalization of the graph structure; where an edge in an ordinary graph links exactly two nodes, an edge in a hypergraph can link multiple nodes. Other information retrieval approaches have used product similarity to improve performance, but modeling product similarity with the hypergraph allows us to use GNNs for prediction, so we can exploit the added structure available in graph representations of data.
In tests, we compared our approach to one that uses GNNs on a bipartite graph only and found that the addition of the hypergraph improved the mean reciprocal rank of the results, which assigns a higher score the closer the correct answer is to the top of a ranked list, by almost 25% and the recall score, which measures the percentage of correct answers retrieved, by more than 48%.
GNNs produce vector representations, or embeddings, of individual graph nodes that capture information about their neighbors. The process is iterative: the first embedding captures only information about the object associated with the node — in our case, product descriptions or query semantics. The second embedding combines the first embedding with those of the node’s immediate neighbors; the third embedding extends the node’s neighborhood by one hop; and so on. Most applications use one- or two-hop embeddings.
The embedding of a hypergraph modifies this procedure slightly. The first iteration, as in the standard case, embeds each of the item nodes individually. The second iteration creates an embedding for the entirety of each hyperedge. The third iteration then produces an embedding for each node that factors in both its own content-level embedding and the embeddings of all the hyperedges it touches.
The architecture of our model has two channels, one for the query-item bipartite graph and one for the item-item hypergraph. Each passes to its own GNN (a graph convolution network), yielding an embedding for each node.
During training, an attention mechanism learns how much weight to give the embedding produced by each channel. A common query with a few popular associated products, for instance, may be well represented by the standard GNN embedding of the bipartite graph. A rarely purchased item, by contrast, associated with a few diverse queries, may benefit from greater weighting of the hypergraph embedding.
To maximize the quality of our model’s predictions, we also experimented with two different unsupervised pretraining methods. One is a contrastive-learning approach in which the GNN is fed pairs of training examples. Some are positive pairs, whose embeddings should be as similar as possible, and some are negative pairs, whose embeddings should be as different as possible.
Following existing practice, we produce positive pairs by randomly deleting edges or nodes of a source graph, so the resulting graphs are similar but not identical. Negative pairs pair the source graph with a different, random graph. We extend this procedure to the hypergraph and ensure consistency between the two channels’ training data; e.g., a node deleted from one channel’s inputs will also be deleted from the other channel’s.
We also experiment with DropEdge, a procedure in which, in successive training epochs, slightly different versions of the same graph are used, with a few edges randomly dropped. This prevents overfitting and oversmoothing, as it encourages the GNN to learn more abstract representations of its inputs.
Pretraining dramatically improves the quality of both our two-channel model and the baseline GNN. But it also increases the discrepancy between the two. That is, our approach by itself sometimes yields only a modest improvement over the baseline model. But our approach with pretraining outperforms the baseline model with pretraining by a larger margin.