Web-scale semantic product search with large language models

Aashiq Muhamed; Sriram Srinivasan; Choon Hui Teo; Qingjun Cui; Belinda Zeng; Trishul Chilimbi; S. V. N. Vishwanathan

Publication

Web-scale semantic product search with large language models

By Aashiq Muhamed, Sriram Srinivasan, Choon Hui Teo, Qingjun Cui, Belinda Zeng, Trishul Chilimbi, S. V. N. Vishwanathan

2023

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Dense embedding-based semantic matching is widely used in e-commerce product search to address the shortcomings of lexical matching such as sensitivity to spelling variants. The recent advances in BERT-like language model encoders, have however, not found their way to realtime search due to the strict inference latency requirement imposed on e-commerce websites. While bi-encoder BERT architectures enable fast approximate nearest neighbor search, training them effectively on query-product data remains a challenge due to training instabilities and the persistent generalization gap with cross-encoders. In this work, we propose a four-stage training procedure to leverage large BERT-like models for product search while preserving low inference latency. We introduce query-product interaction pre-finetuning to effectively pretrain BERT bi-encoders for matching and improve generalization. Through offline experiments on an e-commerce product dataset, we show that a distilled small BERT-based model (75M params) trained using our approach improves the search relevance metric by up to 23% over a baseline DSSM-based model with similar inference latency. The small model only suffers a 3% drop in relevance metric compared to the 20x larger teacher. We also show using online A/B tests at scale, that our approach improves over the production model in exact and substitute products retrieved.

Web-scale semantic product search with large language models

Latest news

Work with us