Decoding and diversity in machine translation

Nicholas Roberts; Davis Liang; Graham Neubig; Zachary Lipton

Publication

Decoding and diversity in machine translation

By Nicholas Roberts, Davis Liang, Graham Neubig, Zachary Lipton

2020

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Neural Machine Translation (NMT) systems are typically evaluated using automated metrics that assess the agreement between generated translations and ground truth candidates. To improve systems with respect to these metrics, NLP researchers employ a variety of heuristic techniques, including searching for the conditional mode (vs. sampling) and incorporating various training heuristics (e.g., label smoothing). While search strategies significantly improve BLEU score, they yield deterministic outputs that lack the diversity of human translations. Moreover, search can amplify socially problematic biases in the data, as has been observed in machine translation of gender pronouns. This makes human-level BLEU a misleading benchmark; modern MT systems cannot approach human-level BLEU while simultaneously maintaining human-level translation diversity. In this paper, we characterize distributional differences between generated and real translations, examining the cost in diversity paid for the BLEU scores enjoyed by NMT. Moreover, our study implicates search as a salient source of known bias when translating gender pronouns.

Decoding and diversity in machine translation

Latest news

Work with us