Query transformation for multi-lingual product search
In this paper, we study the problem of enabling multi-lingual product search for a global shopping store. In particular, given an existing search system and product catalog in a primary language, and a search query in a secondary language, transform the query into a semantically equivalent one in the primary language in order to retrieve the most relevant products. Direct application of machine translation does not always work well in this application due to several factors: 1) lack of consideration of the search system’s response to a transformed search query, 2) sensitivity to spelling/grammatical errors, 3) fragility to inputs in a language that is different from the ones the search system is trained for, and 4) poor handling of named entities (e.g. brand names, model numbers). To address these challenges, we propose a Query Transformation system that consists of 1) a language identifier to detect the language of the input query, 2) a deep neural machine translation model fine-tuned on human-curated parallel query corpus and learned, during training, to copy entities such as model numbers, and 3) a traffic re-ranker which selects the transformation that may help the search system retrieve the most relevant products. Furthermore, we show that standard machine translation evaluation metrics such as BLEU are unsuitable for this application. Therefore, we propose a new offline performance metric that measures how accurately a transformed query reflects customer’s shopping intent and how well the existing search system responds to the transformed query. We present compelling offline and online results: 11% and 3% in improvements in offline nDCG@8 for Spanish (ES) → English (EN) and French (FR) → EN, and 10% and 22% in reduction in online product type search defects for ES→EN and FR→EN, respectively, over a state-of-the-art statistical machine translation system for product search.