Weighted retriever ensembles for video-to-product ads curation
2024
We present Video-to-Product Ads curation system for MiniTV to identify visually relevant products ads corresponding to objects of interest in video. This retrieval task is significantly challenging due to domain gap and peculiarity in images extracted from videos. Traditionally, images to product retrieval problems are solved using contrastive models with extensive labelled image data. In this paper, we present a framework that enhances traditional retrieval systems using pre-trained CLIP models. Specifically, we present three retrieval paradigms: attributes prediction model, image to image matching model and self supervision model which are built on top of CLIP and can address the challenges without manual labelled data. We analyze the strengths and weakness of each retrieval paradigm. Additionally, we present a ensemble model that combines all three models using a novel post scoring weighing method. The ensemble model outperforms individual models, even without domain specific training data. Our ensemble model provides 25.7% gain over best individual model (37.85 vs 30.66 for Precision@5). Also, the ensemble without any task-specific training achieves close to 90% of Precision@5 of the model with task-specific training.
Research areas