KG-FLIP: Knowledge-guided fashion-domain language-image pre-training for e-commerce

Qinjin Jia; Yang Liu; Shaoyuan Xu; Huidong Liu; Daoping Wu; Jinmiao Fu; Roland Vollgraf; Bryan Wang

Publication

KG-FLIP: Knowledge-guided fashion-domain language-image pre-training for e-commerce

By Qinjin Jia, Yang Liu, Shaoyuan Xu, Huidong Liu, Daoping Wu, Jinmiao Fu, Roland Vollgraf, Bryan Wang

2023

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Various Vision-Language Pre-training (VLP) models (e.g., CLIP, BLIP) have sprung up and dramatically improved the benchmarks of public general-domain datasets (e.g., COCO, Flickr30k). Such models typically learn the cross-modal alignment from large-scale well-aligned image-text datasets. Adapting these models to downstream applications in specific domains, such as fashion, requires fine-grained in-domain image-text datasets. However, such datasets are usually less semantically aligned and smaller in scale, which requires more efficient pre-training strategies. In this paper, we propose a knowledge-guided fashion-domain language-image pre-training (KG-FLIP) frame-work that focuses on learning fine-grained representations in the e-commerce domain and utilizes external knowledge (i.e., product at-tribute schema) to improve the pre-training efficiency. Experimental results demonstrate that KG-FLIP outperforms previous state-of-the-art VLP models on Amazon data and the Fashion-Gen dataset by large margins. KG-FLIP has been successfully deployed in the Amazon catalog system to backfill missing attributes and improve the customer shopping experience.

KG-FLIP: Knowledge-guided fashion-domain language-image pre-training for e-commerce

Latest news

Work with us