Query language identification with weak supervision and noisy label pruning
2021
Query Language identification is an important part of a multilingual product search system. However, accurate language identification in product searches is difficult due to multiple reasons, including presence of noise in available datasets. In this work, we propose a learning framework that combines weak supervision with noisy label pruning. We use Convolutional Neural Networks (CNN) based models to carry out such a combination. Our results show improvements over FastText baselines and FastText with weak supervision, thereby demonstrating the benefit of such a combination.
Research areas