Amazon wins contest to control "formality" in machine translation
Data augmentation and post-editing strategies lift Amazon’s submission above competitors.
In a recent shared task at the International Conference on Spoken Language Translation (IWSLT), titled Formality Control for Machine Translation, the formality-controlled translation system developed by Alexa AI ranked first. On English→Japanese translation, it outperformed the second-best system by 9.8 percentage points in absolute accuracy, according to human evaluation. It also achieved almost perfect accuracy (99.6% formal, 99.8% informal) for English→Hindi translation.
What is formality control for machine translation?
Machine translation (MT) models typically return a single translation for each input, without regard to the intended use case or target audience. This kind of unconditional translation is useful in many cases but fails to account for differences in language use in different parts of the world. Leaving the model to choose between different valid options can lead to translations that use an inappropriate degree of formality, which can be perceived as rude or jarring for speakers from certain cultures and in certain use cases, such as customer support chat.
Controlling translation formality with two-stage fine-tuning
We used a two-stage fine-tuning strategy to train our formality-controlled machine translation model. First, we train a generic neural-machine-translation (NMT) model by fine-tuning the mBART multilingual language model on a large-scale parallel-translation corpus. Then we further fine-tune the generic NMT model on formality-annotated data. Each training example is annotated with a formality tag, <formal> or <informal>. During inference, the operator can control the desired formality level of the translation by appending the chosen formality tag to the input text.
Tackling data sparsity
A unique challenge in the IWSLT shared task is data sparsity: only a few hundred formality-annotated samples are available for fine-tuning the NMT model. Therefore, we devised a data augmentation method, using linguistic cues to automatically annotate a small seed set of target (i.e., Hindi and Japanese) texts with formality labels. We then used the seed set to train a multilingual-BERT (mBERT) language model as a multilingual text formality classifier. We used that classifier to further mine massive parallel corpora to find extra formality-annotated data.
Critical to our system’s performance is a set of post-editing techniques, aimed at further correcting the output generated by the formality model. We first proposed two post-editing techniques that leverage language-specific formality rules. The first, called T-V form conversion, can identify and adjust the contextual use of different pronouns that serve to convey formality or familiarity.
The second technique, called verb conjugation, changes the verb to express different formality levels. For example, in Japanese, we can add the “-ます[masu]” suffix to verbs to make a sentence polite without changing its meaning.
Beyond these, we further devised a language-agnostic post-editing strategy using a sequence-to-sequence (seq2seq) pointer generator network. A pointer network is a seq2seq model whose output is a pointer back to the input, so it carries inputs over to the output. A pointer generator network is a pointer network with the option of generating new outputs for particular inputs. It’s thus a good choice for an application like formality control, which changes only certain elements of an input text.
In our offline experiments using the IWSLT challenge test set, we found that data augmentation using the formality classifier improved formality control accuracy in English→Japanese translation by 2.3 percentage points. We also found that post-editing strategies on top of fine-tuned mBART models are simple and effective ways to improve performance. In particular, for Japanese translation, they improved the formal accuracy from 93.9% to 95.5% and the informal accuracy from 98.1% to 100%. For Hindi, we achieved 100% formal translation accuracy, and informal accuracy improved from 84.4% to 97.8%.
Acknowledgements: Jiang Yu, Ashwin Ganesan, Sarah Campbell