Language-agnostic representation learning for product search on e-commerce platforms
Product search forms an indispensable component of any e-commerce service, and helps customers find products of their interest from a large catalog on these websites. When products that are irrelevant to the search query are surfaced, it leads to a poor customer experience, thus reducing user trust and increasing the likelihood of churn. While identifying and removing such results from product search is crucial, doing so is a burdensome task that requires large amounts of human annotated data to train accurate models. This problem is exacerbated when products are cross-listed across countries that speak multiple languages, and customers specify queries in multiple languages and from different cultural contexts. In this work, we propose a novel multi-lingual multi-task learning framework, to jointly train product search models on multiple languages, with limited amount of training data from each language. By aligning the query and product representations from different languages into a language-independent vector space of queries and products, respectively, the proposed model improves the performance over baseline search models in any given language. We evaluate the performance of our model on real data collected from a leading e-commerce service. Our experimental evaluation demonstrates up to 23% relative improvement in the classification F1-score compared to the state-of-the-art baseline models.