LAMM: Language aware active learning for multilingual models
2023
In industrial settings, it is often necessary to achieve language-level accuracy targets. For example, Amazon business teams need to build multilingual product classifiers that operate accurately in all European languages. It is unacceptable for the accuracy of product classification to meet the target in one language (e.g, English), while falling below the target in other languages (e.g, Portuguese). To fix such issues, we propose Language Aware Active Learning for Multilingual Models (LAMM), an active learning strategy that enables a classifier to learn from a small amount of labeled data in a targeted manner to improve the accuracy of Low-resource languages (LRLs) with limited amounts of data for model training. Our empirical results on two open-source datasets and two proprietary product classification datasets demonstrate that LAMM is able to improve the LRL performance by 4%–11% when compared to strong baselines.
Research areas