Unsupervised Translation Quality Estimation for Digital Entertainment Content Subtitles

Prabhakar Gupta; Mayank Sharma

Publication

Unsupervised Translation Quality Estimation for Digital Entertainment Content Subtitles

By Prabhakar Gupta, Mayank Sharma

2019

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

We demonstrate the potential for using aligned bilingual word embeddings in developing an unsupervised method to evaluate machine translations without a need for parallel translation corpus or reference corpus. We explain different aspects of digital entertainment content subtitles. We share our experimental results for four languages pairs — English to French, German, Portuguese, Spanish — and present findings on the shortcomings of Neural Machine Translation for subtitles. We propose several improvements over the system designed by Gupta et al. by incorporating custom embedding model curated to subtitles, compound-word splits and punctuation inclusion. We show a massive runtime improvement of the order of ∼600× by considering three types of edits, removing Proximity Intensity Index (PII) and changing post-edit score calculation from their system.

Unsupervised Translation Quality Estimation for Digital Entertainment Content Subtitles

Latest news

Work with us