Dataset helps evaluate gender bias in machine translation models

Test set includes 1,150 text segments, each in nine languages.

In recent years, machine translation systems have become much more accurate and fluent. As their use expands, it has become increasingly important to ensure that they are as fair, unbiased, and accurate as possible.

For example, machine translation systems sometimes incorrectly translate the genders of people referred to in input segments, even when an individual’s gender is unambiguous based on the linguistic context. Such errors can have an outsize impact on the correctness and fairness of translations. We refer to this problem as one of gender translation accuracy.

Translation bias.no_accent.png
Machine translation models sometimes mistranslate the genders of people mentioned in input texts, even when their genders are unambiguous in context.

To make it easier to evaluate gender translation accuracy in a wide variety of scenarios, my colleagues and I at Amazon Translate have released a new evaluation benchmark: MT-GenEval. We describe the benchmark in a paper we are presenting at the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP).

Related content
Method significantly reduces bias while maintaining comparable performance on machine learning tasks.

MT-GenEval is a large, realistic evaluation set that covers translation from English into eight diverse and widely spoken (but in some cases understudied) languages: Arabic, French, German, Hindi, Italian, Portuguese, Russian, and Spanish. In addition to 1,150 segments of evaluation data per language pair, we release 2,400 parallel sentences for training and development.

Unlike widely used bias test sets that are artificially constructed, MT-GenEval data is based on real-world data sourced from Wikipedia and includes professionally created reference translations in each of the languages. We also provide automatic metrics that evaluate both the accuracy and quality of gender translations.

Gender representations

To get a sense of where gender translation inaccuracies often arise, it is helpful to understand how different languages represent gender. In English, there are some words that unambiguously identify gender, such as she (female gender) or brother (male gender).

A quick guide to Amazon's 40+ papers at EMNLP 2022

Explore Amazon researchers’ accepted papers which address topics like information extraction, question answering, query rewriting, geolocation, and pun generation.

Many languages, including those covered in MT-GenEval, have a more extensive system of grammatical gender, where nouns, adjectives, verbs, and other parts of speech can be marked for gender. To give an example, the Spanish translation of “a tall librarian” is different if the librarian is a woman (una bibliotecaria alta) or a man (un bibliotecario alto).

Related content
Eliminating the need for annotation makes bias testing much more practical.

When a machine translation model translates from a language with no or limited gender (like English) into a language with extensive grammatical gender (like Spanish), it must not only translate but also correctly express the genders of words that lack gender in the input. For example, with the English sentence “He is a tall librarian,” the model must correctly select the male grammatical gender for “a” (un, not una), “tall” (alto, not alta) and “librarian” (bibliotecario, not bibliotecaria), all based on the single input word “He.”

In the real world, input texts are often more complex than this simple example, and the word that disambiguates an individual’s gender might be very far — potentially even in another sentence — from the words that express gender in the translation. In these cases, we have observed that machine translation models have a tendency to disregard the disambiguating context and even fall back on gender stereotypes (such as translating “pretty” as female and “handsome” as male, regardless of the context).

While we have seen several anecdotal cases of these types of gender translation accuracy issues, until now there has not been a way to systematically quantify such cases in realistic, complex input text. With MT-GenEval, we hope to bridge this gap.

Building the dataset

To create MT-GenEval, we first scoured English Wikipedia articles to find candidate text segments each of which contained at least one gendered word in a span of three sentences. Because we wanted to ensure that the segments were relevant for evaluating gender accuracy, we asked human annotators to exclude any sentences that either did not refer to individuals (e.g., “The movie She’s All That was released in 1999”) or did not unambiguously express the gender of that individual (e.g., “You are a tall librarian”).

Related content
Open-source library enables optimization of hyperparameters to maximize performance while meeting fairness constraints.

Then, in order to balance the test set by gender, the annotators created counterfactuals for the segments, where each individual’s gender was changed either from female to male or from male to female. (In its initial release, MT-GenEval covers two genders: female and male.) For example, “He is a prince and will someday be king” would be changed to “She is a princess and will someday be queen.” This type of balancing ensures that differently gendered subsets do not have different meanings. Finally, professional translators translated each sentence into the eight target languages.

A balanced test set also allows us to evaluate gender translation accuracy, because for every segment, it provides a correct translation, with correct genders, and a contrastive translation, which differs from the correct translation only in gender-specific words. In the paper, we propose a straightforward accuracy metric: for a given translation with the desired gender, we consider all the gendered words in the contrastive reference. If the translation contains any of the gendered words in the contrastive reference, it is marked incorrect; otherwise, it’s marked correct. Our automatic metric agreed with annotators reasonably well, with F scores of over 80% across all eight target languages (English was the source language).

Related content
Method presented to ICML workshop works with any machine learning model and fairness criterion.

While this assesses translations on the lexical level, we also introduce a metric to measure differences in machine translation quality for masculine and feminine outputs. We define this gender quality gap as the difference in BLEU scores on the masculine and feminine subsets of the balanced dataset.

Given this extensive curation and annotation, MT-GenEval is a step forward for evaluation of gender accuracy in machine translation. We hope that by releasing MT-GenEval, we can inspire more researchers to work on improving gender translation accuracy on complex, real-world inputs in a variety of languages.

Research areas

Related content

US, NY, New York
We are seeking a Robotics/AI Motor Control Scientist to develop cutting-edge machine learning algorithms for motor control systems in robots. In this role, you will focus on creating and optimizing intelligent motor control strategies to enable robots to perform complex, whole-body tasks. Your contributions will be essential in advancing robotics by enabling fluid, reliable, and safe interactions between robots and their environments. Key job responsibilities - Develop controllers that leverage reinforcement learning, imitation learning, or other advanced AI techniques to achieve natural, robust, and adaptive motor behaviors - Collaborate with multi-disciplinary teams to integrate motor control systems with robotic hardware, ensuring alignment with real-world constraints such as actuator dynamics and energy efficiency - Use simulation and real-world testing to refine and validate control algorithms - Stay updated on advancements in robotics, AI, and control systems to apply advanced techniques to robotic motion challenges - Lead technical projects from conception through production deployment - Mentor junior scientists and engineers - Bridge research initiatives with practical engineering implementation About the team Fauna Robotics, an Amazon company, is building capable, safe, and genuinely delightful robots for everyday life. Our goal is simple: make robots people actually want to live and interact with in everyday human spaces. We believe that future won’t arrive until building for robotics becomes far more accessible. Today, too much effort is spent reinventing the fundamentals. We’re changing that by developing tightly integrated hardware and software systems that make it faster, safer, and more intuitive to create real-world robotic products. Our work spans the full stack: mechanical design, control systems, dynamic modeling, and intelligent software. The focus is not just functionality, but experience. We’re building robots that feel responsive, expressive, and genuinely useful. At Fauna, you’ll work at the frontier of this space, helping define how robots move, manipulate, and interact with people in natural environments. It’s an opportunity to solve hard problems across hardware and software with a team focused on making robotics accessible and joyful to build. If you care about making robotics real for everyone and building systems that are as delightful as they are capable, we’re interested in hearing from you. an opportunity to solve hard problems across hardware and software with a team focused on making robotics accessible and joyful to build. If you care about making robotics real for everyone and building systems that are as delightful as they are capable, we’re interested in hearing from you.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists in the Forecasting, Macroeconomics & Finance field document, interpret and forecast Amazon business dynamics. This track is well suited for economists adept at combining times-series statistical methods with strong economic analysis and intuition. This track could be a good fit for candidates with research experience in: macroeconometrics and/or empirical macroeconomics; international macroeconomics; time-series econometrics; forecasting; financial econometrics and/or empirical finance; and the use of micro and panel data to improve and validate traditional aggregate models. Economists at Amazon are expected to work directly with our senior management and scientists from other fields on key business problems faced across Amazon, including retail, cloud computing, third party merchants, search, Kindle, streaming video, and operations. The Forecasting, Macroeconomics & Finance field utilizes methods at the frontier of economics to develop formal models to understand the past and the present, predict the future, and identify relevant risks and opportunities. For example, we analyze the internal and external drivers of growth and profitability and how these drivers interact with the customer experience in the short, medium and long-term. We build econometric models of dynamic systems, using our world class data tools, formalizing problems using rigorous science to solve business issues and further delight customers.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company.
US, WA, Seattle
Economists in the Forecasting, Macroeconomics & Finance field document, interpret and forecast Amazon business dynamics. This track is well suited for economists adept at combining times-series statistical methods with strong economic analysis and intuition. This track could be a good fit for candidates with research experience in: macroeconometrics and/or empirical macroeconomics; international macroeconomics; time-series econometrics; forecasting; financial econometrics and/or empirical finance; and the use of micro and panel data to improve and validate traditional aggregate models. Economists at Amazon are expected to work directly with our senior management and scientists from other fields on key business problems faced across Amazon, including retail, cloud computing, third party merchants, search, Kindle, streaming video, and operations. The Forecasting, Macroeconomics & Finance field utilizes methods at the frontier of economics to develop formal models to understand the past and the present, predict the future, and identify relevant risks and opportunities. For example, we analyze the internal and external drivers of growth and profitability and how these drivers interact with the customer experience in the short, medium and long-term. We build econometric models of dynamic systems, using our world class data tools, formalizing problems using rigorous science to solve business issues and further delight customers.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists at Amazon partner closely with senior management, business stakeholders, scientist and engineers, and economist leadership to solve key business problems ranging from Amazon Web Services, Kindle, Prime, inventory planning, international retail, third party merchants, search, pricing, labor and employment planning, effective benefits (health, retirement, etc.) and beyond. Amazon Economists build econometric models using our world class data systems and apply approaches from a variety of skillsets – applied macro/time series, applied micro, econometric theory, empirical IO, empirical health, labor, public economics and related fields are all highly valued skillsets at Amazon. You will work in a fast moving environment to solve business problems as a member of either a cross-functional team embedded within a business unit or a central science and economics organization. You will be expected to develop techniques that apply econometrics to large data sets, address quantitative problems, and contribute to the design of automated systems around the company.
US, WA, Seattle
Amazon.com strives to be Earth's most customer-centric company where customers can shop in our stores to find and discover anything they want to buy. We hire the world's brightest minds, offering them a fast paced, technologically sophisticated and friendly work environment. Economists in the Forecasting, Macroeconomics & Finance field document, interpret and forecast Amazon business dynamics. This track is well suited for economists adept at combining times-series statistical methods with strong economic analysis and intuition. This track could be a good fit for candidates with research experience in: macroeconometrics and/or empirical macroeconomics; international macroeconomics; time-series econometrics; forecasting; financial econometrics and/or empirical finance; and the use of micro and panel data to improve and validate traditional aggregate models. Economists at Amazon are expected to work directly with our senior management and scientists from other fields on key business problems faced across Amazon, including retail, cloud computing, third party merchants, search, Kindle, streaming video, and operations. The Forecasting, Macroeconomics & Finance field utilizes methods at the frontier of economics to develop formal models to understand the past and the present, predict the future, and identify relevant risks and opportunities. For example, we analyze the internal and external drivers of growth and profitability and how these drivers interact with the customer experience in the short, medium and long-term. We build econometric models of dynamic systems, using our world class data tools, formalizing problems using rigorous science to solve business issues and further delight customers.