Mitigating social bias in knowledge graph embeddings

Method significantly reduces bias while maintaining comparable performance on machine learning tasks.

Question-answering systems frequently rely on knowledge graphs, large collections of facts about real-world entities (people, organizations, countries, etc.). To make use of the information in knowledge graphs, machine learning models often employ knowledge graph embeddings, vector representations of the entities in the graphs. 

A potential problem with this approach is that the distributions of data in knowledge graphs reflect current and historical social biases. For instance, most knowledge graphs include more male entities than female with the profession “banker”, or more “white American” entities than “African-American” entities with the profession “ballet dancer”. 

If knowledge graph embeddings end up encoding these biases, so will the question-answering systems that use them. If a little girl talking to a chatbot asks, “What should I be when I grow up?”, a biased embedding might rule out possible answers that are predominantly associated with men in the knowledge graph. For some professions — “baritone”, for instance — that may be fine. But in other cases, the biases may be relics of a less egalitarian past.

A two-dimensional representation of our method for measuring bias in knowledge graph embeddings.
A two-dimensional representation of our method for measuring bias in knowledge graph embeddings. In each diagram, the blue dots labeled person1 indicate the shift in an embedding as we tune its parameters. The orange arrows represent relation vectors and the orange dots the resulting sums. As we shift the gender relation toward maleness, the profession relation shifts away from nurse and closer to doctor.
Credit: Glynis Condon

Earlier this year, at the AKBC Workshop on Bias in Knowledge Graphs, we presented a paper that examines this problem. Using a standard embedding technique, we looked for correlations between the professions of people listed in Wikidata and demographic factors, such as gender, ethnicity, and religion, to see whether the embeddings do indeed encode harmful social biases. 

Following on from this, at last week’s Conference on Empirical Methods in Natural Language Processing (EMNLP), we presented “Debiasing knowledge graph embeddings”, in which we attempt to address this problem by developing a lightweight alteration to the standard method of training graph embeddings that reduces bias. 

As knowledge graph embeddings become more widely used within the machine learning community, we hope this work raises awareness of the biases they may encode and moves us closer to the goal of effective debiasing.

Knowledge graph embedding

Knowledge graph consisting of a left entity, a relation, and a right entity.
Knowledge graphs generally store facts as triples consisting of a left entity, a relation, and a right entity.
Credit: Joseph Fisher

A standard knowledge graph represents data using triples, each of which consists of two entities and the relationship between them: for instance, the entities emmanuelle_charpentier and germany are related by the relation lives_in.

Knowledge graph embeddings represent the entities in a knowledge graph as points in a multidimensional space. The idea is that spatial relationships between the points encode the relationships captured by the graph. 

With the common embedding framework TransE, for instance, adding the vector representing the relationship lives_in to the point representing emmanuelle_charpentier should bring us close to the location of the point representing germany. 

During training, the embedding model learns to maximize the accuracy of these spatial relationships across all the triples captured in the knowledge graph. Among other applications, embeddings can be used for link prediction, or inferring relationships between entities that do not yet feature in the graph.

Do trained knowledge graph embeddings encode social biases?

To see why knowledge graph embeddings might encode social biases, let’s look at the counts of male and female entities in Wikidata, the most extensive open-source knowledge graph.

Counts of male and female entities in Wikidata
Counts of male and female entities in Wikidata
Credit: Joseph Fisher

There are more than four times as many male entities in Wikidata as there are female, a reflection of long-persisting social biases in the real world. 

In our paper “Measuring social bias in knowledge graph embeddings”, we determine whether such differences in counts become encoded in embeddings. To do this, we take the embedding of a human entity and tune it so that the addition of a relation vector — such has has_religion or has_gender — edges closer to the embedding for some particular right-hand attribute — such as “Catholic” or “female”.

List of top 20 'most female' professions in Wikidata according to TransE embeddings. 
The top 20 “most female” professions in Wikidata according to TransE embeddings. B_p denotes the bias score, C_fem the counts of female entities in the knowledge graph with these professions, and C_male the counts of male entities with these professions.
From “Measuring social bias in knowledge graph embeddings”

As we tune the embedding, we observe how the result of adding the has_profession vector changes. That is, for each potential profession, we determine whether the model assigns it to the person with greater or lesser probability as the embedding changes. 

Running this calculation across all humans and professions, we are able to identify the professions that the embeddings encode as the “most male” and the “most female”. The table at right shows the top 20 “most female” professions according to our measure. (The number of entities in Wikipedia with non-binary genders is comparatively negligible; although this represents another bias in the data, it also means that the resulting embeddings would be too noisy to yield meaningful results in our study.)

List of the top 20 'most male' professions in Wikidata according to TransE embeddings.
Top 20 "most male" professions in Wikidata according to TransE embeddings.
From “Measuring social bias in knowledge graph embeddings”

The differences in the counts of entities in the knowledge graph with these professions appear to translate to biases in the embeddings. There are some professions, such as “homekeeper”, that we would prefer were not associated with a particular gender; others, such as “woman of letters”, may be less controversial. 

We also calculate the top 20 “male” professions, where the conclusions are similar.

Can we adjust the training of knowledge graph embeddings to reduce encoded biases?

In “Debiasing knowledge graph embeddings”, we turn our attention to reducing such biases and their potentially harmful consequences for downstream applications, such as chatbots. To do this, we train the embedding model not only on how faithfully it reconstructs triples but also on how well it approximates even distributions for gender and other sensitive characteristics, such as religion. 

Put another way, we update the embedding of person1 so that it becomes impossible for the model to predict gender. If this is done precisely, it should also break correlations between gender and profession.

Diagram that demonstrates measurement of how well a knowledge graph embedding scheme matches target distributions.
Our debiasing approach uses Kullback-Leibler (KL) divergence to measure how well a knowledge graph embedding scheme matches our target distributions.
Credit: Joseph Fisher

A potential drawback is that this approach prevents the model from using gender, religion, nationality, or ethnicity to predict noncontroversial triples. For instance, we may like the embeddings to reflect that a nun is more likely to be female than male.

To allow this, we introduce attribute embeddings. In cases where we wish to make use of sensitive information, we can simply add these attribute vectors back in to the embeddings.

Attribute vector: the male attribute embedding back into the model for the profession nun but not for the profession doctor.
Here, we add the male attribute embedding back into the model for the profession nun but not for the profession doctor.
Credit: Joseph Fisher

We evaluate our model against a Basic TransE model with no debiasing and against the debiasing approach adopted by Bose et al., which uses neural-network filters proposed in the literature previously. We measure the usefulness of the embeddings for link prediction (according to mean reciprocal rank, or MRR), their bias, and training time.

During training, embeddings are scored on their accuracy — the degree to which they reproduce the corresponding triples in the knowledge graph. We measure bias as the difference between those scores for entities that fall into one category or another — religion, gender, and so on. We find that our model incurs a slight (roughly 3%) dropoff in link prediction accuracy in exchange for a dramatic reduction in bias.

MRR

Gender bias

Seconds per epoch

Basic

0.68

2.79

68.4

Bose et al.

0.426

2.75

533.3

Ours

0.66

0.19

89.4

Related content

ES, Barcelona
Are you a MS or PhD student interested in a 2026 internship in the field of machine learning, deep learning, generative AI, large language models, speech technology, robotics, computer vision, optimization, operations research, quantum computing, automated reasoning, or formal methods? If so, we want to hear from you! We are looking for students interested in using a variety of domain expertise to invent, design and implement state-of-the-art solutions for never-before-solved problems. You can find more information about the Amazon Science community as well as our interview process via the links below; https://www.amazon.science/ https://amazon.jobs/content/en/career-programs/university/science https://amazon.jobs/content/en/how-we-hire/university-roles/applied-science Key job responsibilities As an Applied Science Intern, you will own the design and development of end-to-end systems. You’ll have the opportunity to write technical white papers, create roadmaps and drive production level projects that will support Amazon Science. You will work closely with Amazon scientists and other science interns to develop solutions and deploy them into production. You will have the opportunity to design new algorithms, models, or other technical solutions whilst experiencing Amazon’s customer focused culture. The ideal intern must have the ability to work with diverse groups of people and cross-functional teams to solve complex business problems. A day in the life At Amazon, you will grow into the high impact person you know you’re ready to be. Every day will be filled with developing new skills and achieving personal growth. How often can you say that your work changes the world? At Amazon, you’ll say it often. Join us and define tomorrow. Some more benefits of an Amazon Science internship include; • All of our internships offer a competitive stipend/salary • Interns are paired with an experienced manager and mentor(s) • Interns receive invitations to different events such as intern program initiatives or site events • Interns can build their professional and personal network with other Amazon Scientists • Interns can potentially publish work at top tier conferences each year About the team Applicants will be reviewed on a rolling basis and are assigned to teams aligned with their research interests and experience prior to interviews. Start dates are available throughout the year and durations can vary in length from 3-6 months for full time internships. This role may available across multiple locations in the EMEA region (Austria, Estonia, France, Germany, Ireland, Israel, Italy, Jordan, Luxembourg, Netherlands, Poland, Romania, Spain, South Africa, UAE, and UK). Please note these are not remote internships.
US, CA, San Francisco
The Amazon AGI SF Lab is focused on developing new foundational capabilities for enabling useful AI agents that can take actions in the digital and physical worlds. In other words, we’re enabling practical AI that can actually do things for us and make our customers more productive, empowered, and fulfilled. The lab is designed to empower AI researchers and engineers to make major breakthroughs with speed and focus toward this goal. Our philosophy combines the agility of a startup with the resources of Amazon. By keeping the team lean, we’re able to maximize the amount of compute per person. Each team in the lab has the autonomy to move fast and the long-term commitment to pursue high-risk, high-payoff research. Key job responsibilities - Develop multimodal Large Language Models (LLMs) to observe, model and derive insights from manual workflows for automation - Work in a joint scrum with engineers for rapid invention, develop automation agent systems, and take them to launch for millions of customers - Collaborate with cross-functional teams of engineers, product managers, and scientists to identify and solve complex problems in GenAI - Design and execute experiments to evaluate the performance of different algorithms and models, and iterate quickly to improve results - Think big about the arc of development of GenAI over a multi-year horizon, and identify new opportunities to apply these technologies to solve real-world problems - Communicate results and insights to both technical and non-technical audiences, including through presentations and written reports - Mentor and guide junior scientists and engineers, and contribute to the overall growth and development of the team
US, WA, Seattle
Prime Video is a first-stop entertainment destination offering customers a vast collection of premium programming in one app available across thousands of devices. Prime members can customize their viewing experience and find their favorite movies, series, documentaries, and live sports – including Amazon MGM Studios-produced series and movies; licensed fan favorites; and programming from Prime Video add-on subscriptions such as Apple TV+, Max, Crunchyroll and MGM+. All customers, regardless of whether they have a Prime membership or not, can rent or buy titles via the Prime Video Store, and can enjoy even more content for free with ads. Are you interested in shaping the future of entertainment? Prime Video's technology teams are creating best-in-class digital video experience. As a Prime Video technologist, you’ll have end-to-end ownership of the product, user experience, design, and technology required to deliver state-of-the-art experiences for our customers. You’ll get to work on projects that are fast-paced, challenging, and varied. You’ll also be able to experiment with new possibilities, take risks, and collaborate with remarkable people. We’ll look for you to bring your diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. With global opportunities for talented technologists, you can decide where a career Prime Video Tech takes you! We are looking for a self-motivated, passionate and resourceful Applied Science Manager to bring diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. You will lead a strong science team and work closely with other science and engineering leaders, product and business partners together to build the best personalized customer experience for Prime Video. At the end of the day, you will have the reward of seeing your contributions benefit millions of Amazon.com customers worldwide. Key job responsibilities - Lead to develop AI solutions for various Prime Video recommendation and personalization systems using Deep learning, GenAI, Reinforcement Learning, recommendation system and optimization methods; - Work closely with engineers and product managers to design, implement and launch AI solutions end-to-end; - Effectively communicate technical and non-technical ideas with teammates and stakeholders; - Stay up-to-date with advancements and the latest modeling techniques in the field; - Hire and grow a science team working in this exciting video personalization domain. About the team Prime Video Recommendation Science team owns science solution to power recommendation and personalization experience on various devices. We work closely with the engineering teams to launch our solutions in production.
US, WA, Bellevue
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Senior Applied Scientist to work on methodologies for Generative Artificial Intelligence (GenAI) models. As a Senior Applied Scientist, you will be responsible for leading the development of novel algorithms and modeling techniques to advance the state of the art. Your work will directly impact our customers and will leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate development with multi-modal Large Language Models (LLMs) and GenAI. You will have significant influence on our overall strategy by working at the intersection of engineering and applied science to scale pre-training and post-training workflows and build efficient models. You will support the system architecture and the best practices that enable a quality infrastructure. Key job responsibilities Join us to work as an integral part of a team that has experience with GenAI models in this space. We work on these areas: - Pre-training and post-training multimodal LLMs - Scale training, optimization methods, and learning objectives - Utilize, build, and extend upon industry-leading frameworks - Work with other team members to investigate design approaches, prototype new technology, scientific techniques and evaluate technical feasibility - Deliver results independently in a self-organizing Agile environment while constantly embracing and adapting new scientific advances About the team The AGI team has a mission to push the envelope in GenAI with Large Language Models (LLMs) and multimodal systems, in order to provide the best-possible experience for our customers.
CA, BC, Vancouver
Join our Amazon Private Brands Selection Guidance organization in building science and tech solutions at scale to delight our customers with products across our leading private brands such as Amazon Basics, Amazon Essentials, and by Amazon. The Selection Guidance team applies Generative AI, Machine Learning, Statistics, and Economics solutions to drive our private brands product assortment, strategic business decisions, and product inputs such as title, price, merchandising and ordering. We are an interdisciplinary team of Scientists, Economists, Engineers, and Product Managers incubating and building day one solutions using novel technology, to solve some of the toughest business problems at Amazon. As a Sr. Data Scientist you will invent novel solutions and prototypes, and directly contribute to bringing your ideas to life through production implementation. Current research areas include entity resolution, agentic AI, large language models, and product substitutes. You will review and guide scientists across the team on their designs and implementations, and raise the team bar for science research and prototypes. This is a unique, high visibility opportunity for someone who wants to develop ambitious science solutions and have direct business and customer impact. Key job responsibilities - Partner with business stakeholders to deeply understand APB business problems and frame ambiguous business problems as science problems and solutions. - Invent novel science solutions, develop prototypes, and deploy production software to solve business problems. - Review and guide science solutions across the team. - Publish and socialize your and the team's research across Amazon and external avenues as appropriate - Leverage industry best practices to establish repeatable applied science practices, principles & processes.
US, WA, Seattle
We are looking for a passionate Applied Scientist to help pioneer the next generation of agentic AI applications for Amazon advertisers. In this role, you will design agentic architectures, develop tools and datasets, and contribute to building systems that can reason, plan, and act autonomously across complex advertiser workflows. You will work at the forefront of applied AI, developing methods for fine-tuning, reinforcement learning, and preference optimization, while helping create evaluation frameworks that ensure safety, reliability, and trust at scale. You will work backwards from the needs of advertisers—delivering customer-facing products that directly help them create, optimize, and grow their campaigns. Beyond building models, you will advance the agent ecosystem by experimenting with and applying core primitives such as tool orchestration, multi-step reasoning, and adaptive preference-driven behavior. This role requires working independently on ambiguous technical problems, collaborating closely with scientists, engineers, and product managers to bring innovative solutions into production. Key job responsibilities - Design and build agents to guide advertisers in conversational and non-conversational experience. - Design and implement advanced model and agent optimization techniques, including supervised fine-tuning, instruction tuning and preference optimization (e.g., DPO/IPO). - Curate datasets and tools for MCP. - Build evaluation pipelines for agent workflows, including automated benchmarks, multi-step reasoning tests, and safety guardrails. - Develop agentic architectures (e.g., CoT, ToT, ReAct) that integrate planning, tool use, and long-horizon reasoning. - Prototype and iterate on multi-agent orchestration frameworks and workflows. - Collaborate with peers across engineering and product to bring scientific innovations into production. - Stay current with the latest research in LLMs, RL, and agent-based AI, and translate findings into practical applications. About the team The Sponsored Products and Brands team at Amazon Ads is re-imagining the advertising landscape through the latest generative AI technologies, revolutionizing how millions of customers discover products and engage with brands across Amazon.com and beyond. We are at the forefront of re-inventing advertising experiences, bridging human creativity with artificial intelligence to transform every aspect of the advertising lifecycle from ad creation and optimization to performance analysis and customer insights. We are a passionate group of innovators dedicated to developing responsible and intelligent AI technologies that balance the needs of advertisers, enhance the shopping experience, and strengthen the marketplace. If you're energized by solving complex challenges and pushing the boundaries of what's possible with AI, join us in shaping the future of advertising. The Advertiser Guidance team within Sponsored Products and Brands is focused on guiding and supporting 1.6MM advertisers to meet their advertising needs of creating and managing ad campaigns. At this scale, the complexity of diverse advertiser goals, campaign types, and market dynamics creates both a massive technical challenge and a transformative opportunity: even small improvements in guidance systems can have outsized impact on advertiser success and Amazon’s retail ecosystem. Our vision is to build a highly personalized, context-aware agentic advertiser guidance system that leverages LLMs together with tools such as auction simulations, ML models, and optimization algorithms. This agentic framework, will operate across both chat and non-chat experiences in the ad console, scaling to natural language queries as well as proactively delivering guidance based on deep understanding of the advertiser. To execute this vision, we collaborate closely with stakeholders across Ad Console, Sales, and Marketing to identify opportunities—from high-level product guidance down to granular keyword recommendations—and deliver them through a tailored, personalized experience. Our work is grounded in state-of-the-art agent architectures, tool integration, reasoning frameworks, and model customization approaches (including tuning, MCP, and preference optimization), ensuring our systems are both scalable and adaptive.
US, CA, Sunnyvale
Prime Video is a first-stop entertainment destination offering customers a vast collection of premium programming in one app available across thousands of devices. Prime members can customize their viewing experience and find their favorite movies, series, documentaries, and live sports – including Amazon MGM Studios-produced series and movies; licensed fan favorites; and programming from Prime Video add-on subscriptions such as Apple TV+, Max, Crunchyroll and MGM+. All customers, regardless of whether they have a Prime membership or not, can rent or buy titles via the Prime Video Store, and can enjoy even more content for free with ads. Are you interested in shaping the future of entertainment? Prime Video's technology teams are creating best-in-class digital video experience. As a Prime Video technologist, you’ll have end-to-end ownership of the product, user experience, design, and technology required to deliver state-of-the-art experiences for our customers. You’ll get to work on projects that are fast-paced, challenging, and varied. You’ll also be able to experiment with new possibilities, take risks, and collaborate with remarkable people. We’ll look for you to bring your diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. With global opportunities for talented technologists, you can decide where a career Prime Video Tech takes you! Key job responsibilities - Develop ML models for various recommendation & search systems using deep learning, online learning, and optimization methods - Work closely with other scientists, engineers and product managers to expand the depth of our product insights with data, create a variety of experiments to determine the high impact projects to include in planning roadmaps - Stay up-to-date with advancements and the latest modeling techniques in the field - Publish your research findings in top conferences and journals A day in the life We're using advanced approaches such as foundation models to connect information about our videos and customers from a variety of information sources, acquiring and processing data sets on a scale that only a few companies in the world can match. This will enable us to recommend titles effectively, even when we don't have a large behavioral signal (to tackle the cold-start title problem). It will also allow us to find our customer's niche interests, helping them discover groups of titles that they didn't even know existed. We are looking for creative & customer obsessed machine learning scientists who can apply the latest research, state of the art algorithms and ML to build highly scalable page personalization solutions. You'll be a research leader in the space and a hands-on ML practitioner, guiding and collaborating with talented teams of engineers and scientists and senior leaders in the Prime Video organization. You will also have the opportunity to publish your research at internal and external conferences. About the team Prime Video Recommendation Science team owns science solution to power recommendation and personalization experience on various Prime Video surfaces and devices. We work closely with the engineering teams to launch our solutions in production.
US, CA, Sunnyvale
Prime Video is a first-stop entertainment destination offering customers a vast collection of premium programming in one app available across thousands of devices. Prime members can customize their viewing experience and find their favorite movies, series, documentaries, and live sports – including Amazon MGM Studios-produced series and movies; licensed fan favorites; and programming from Prime Video add-on subscriptions such as Apple TV+, Max, Crunchyroll and MGM+. All customers, regardless of whether they have a Prime membership or not, can rent or buy titles via the Prime Video Store, and can enjoy even more content for free with ads. Are you interested in shaping the future of entertainment? Prime Video's technology teams are creating best-in-class digital video experience. As a Prime Video technologist, you’ll have end-to-end ownership of the product, user experience, design, and technology required to deliver state-of-the-art experiences for our customers. You’ll get to work on projects that are fast-paced, challenging, and varied. You’ll also be able to experiment with new possibilities, take risks, and collaborate with remarkable people. We’ll look for you to bring your diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. With global opportunities for talented technologists, you can decide where a career Prime Video Tech takes you! We are looking for a self-motivated, passionate and resourceful Applied Scientist to bring diverse perspectives, ideas, and skill-sets to make Prime Video even better for our customers. You will spend your time as a hands-on machine learning practitioner and a research leader. You will play a key role on the team, building and guiding machine learning models from the ground up. At the end of the day, you will have the reward of seeing your contributions benefit millions of Amazon.com customers worldwide. Key job responsibilities - Develop AI solutions for various Prime Video Search systems using Deep learning, GenAI, Reinforcement Learning, and optimization methods; - Work closely with engineers and product managers to design, implement and launch AI solutions end-to-end; - Design and conduct offline and online (A/B) experiments to evaluate proposed solutions based on in-depth data analyses; - Effectively communicate technical and non-technical ideas with teammates and stakeholders; - Stay up-to-date with advancements and the latest modeling techniques in the field; - Publish your research findings in top conferences and journals. About the team Prime Video Search Science team owns science solution to power search experience on various devices, from sourcing, relevance, ranking, to name a few. We work closely with the engineering teams to launch our solutions in production.
CA, ON, Toronto
Are you a passionate scientist in the computer vision area who is aspired to apply your skills to bring value to millions of customers? Here at Ring, we have a unique opportunity to innovate and see how the results of our work improve the lives of millions of people and make neighborhoods safer. As an Applied Scientist, you will work with talented peers pushing the frontier of computer vision and machine learning technology to deliver the best experience for our neighbors. This is a great opportunity for you to innovate in this space by developing highly optimized algorithms that will work at scale. This position requires experience with developing Multi-modal LLMs and/or Vision Language Models. You will collaborate with different Amazon teams to make informed decisions on the best practices in machine learning to build highly-optimized integrated hardware and software platforms. Key job responsibilities - Participate in the design, development, evaluation, deployment and updating of data-driven models for computer vision applications. - Research and implement the state-of-the-art computer vision and Vision Language models algorithms. - Collaborate with product managers and engineering teams to design and implement computer vision and machine learning based features for Ring devices - Influence system design and product vision by making informed decisions on the selection of technology, data sources, algorithms, and sensors.
CA, ON, Toronto
Are you a passionate scientist in the computer vision area who is aspired to apply your skills to bring value to millions of customers? Here at Ring, we have a unique opportunity to innovate and see how the results of our work improve the lives of millions of people and make neighborhoods safer. You will be part of a team committed to pushing the frontier of computer vision and machine learning technology to deliver the best experience for our neighbors. This is a great opportunity for you to innovate in this space by developing highly optimized algorithms that will work on scale. This position requires experience with developing Multi-modal LLMs and Vision Language Models. You will collaborate with different Amazon teams to make informed decisions on the best practices in machine learning to build highly-optimized integrated hardware and software platforms. Key job responsibilities - Participate in the design, development, evaluation, deployment and updating of data-driven models for computer vision applications. - Research and implement the state-of-the-art computer vision and Vision Language models algorithms. - Collaborate with product managers and engineering teams to design and implement computer vision and machine learning based features for Ring devices - Influence system design and product vision by making informed decisions on the selection of technology, data sources, algorithms, and sensors.