Blend and Match: Distilling semantic search models with different inductive biases and model architectures
Commercial search engines use different semantic models to augment lexical matches. These models provide candidate items for a user’s query from a target space of millions to billions of items. Models with different inductive biases provide relatively different predictions, making it desirable to launch multiple semantic models in production. However, latency and resource constraints make simultaneously deploying multiple models impractical. In this paper, we introduce a distillation approach, called Blend and Match (BM), to unify two different semantic search models into a single model. We use a Bi-encoder semantic matching model as our primary model and propose a novel loss function to incorporate eXtreme Multi-label Classification (XMC) predictions as the secondary model. Our experiments conducted on two large-scale datasets, collected from a popular e-commerce store, show that our proposed approach significantly improves the recall of the primary Bi-encoder model by 11% to 17% with a minimal loss in precision. We show that traditional knowledge distillation approaches result in a sub-optimal performance for our problem setting, and our BM approach yields comparable rankings with strong Rank Fusion (RF) methods used only if one could deploy multiple models.