GEM: Translation-free zero-shot global entity matcher for global catalogs
2021
We propose a modular BiLSTM/ CNN /Transformer deep-learning encoder architecture, together with a data synthesis and training approach, to solve the problem of matching catalog products across different languages, different local catalogs, and different catalog data contributors. The end-to-end model relies solely on raw natural language textual data in the catalog entries and on images of the products, without any feature engineering, and is entirely translation-free, not requiring the translation of the catalog natural language data to the same base language for inference. We report experiments results on a 4-languages-scope model (English, French, German, Spanish) matching entities from 4 local catalogs (UK, France, Germany, Spain) of a retail website. We demonstrate that the model achieves performance comparable to state-of-the-art existing entity matchers that operate within a single language, and that the model achieves high-performance zero-shot inference on language pairs not seen in training.
Research areas