Unsupervised Translation Quality Estimation for Digital Entertainment Content Subtitles
We demonstrate the potential for using aligned bilingual word embeddings in developing an unsupervised method to evaluate machine translations without a need for parallel translation corpus or reference corpus. We explain different aspects of digital entertainment content subtitles. We share our experimental results for four languages pairs — English to French, German, Portuguese, Spanish — and present findings on the shortcomings of Neural Machine Translation for subtitles. We propose several improvements over the system designed by Gupta et al. by incorporating custom embedding model curated to subtitles, compound-word splits and punctuation inclusion. We show a massive runtime improvement of the order of ∼600× by considering three types of edits, removing Proximity Intensity Index (PII) and changing post-edit score calculation from their system.