# Explaining changes in real-world data

## New method identifies which causal factors contribute most to observed changes in probability distributions.

The success of deep learning is a testament to the power of statistical correlation: if certain image features are consistently correlated with the label “cat”, you can teach a machine learning model to identify cats.

But sometimes, correlation is not enough; you need to identify causation. For example, during the COVID-19 pandemic, a retailer might have seen a sharp decline in its inventory for a particular product. What caused that decline? An increase in demand? A shortage in supply? Delays in shipping? The failure of a forecasting model? The remedy might vary depending on the cause.

Earlier this month, at the International Conference on Artificial Intelligence and Statistics (AISTATS), my colleagues and I presented a new technique for identifying the causes of shifts in a probability distribution. Our approach involves causal graphs, which are graphical blueprints of sequential processes.

Each node of the graph, together with its incoming edges, represents a causal mechanism, or the probability that a given event will follow from the event that precedes it. We show how to compute the contribution that changes in the individual mechanisms make to changes in the probability of the final outcome.

We tested our approach using simulated data, so that we could stipulate the probabilities of the individual causal mechanisms, giving us a ground truth to measure against. Our approach yielded estimates that were very close to the ground truth — a deviation of only 0.29 according to L1 distance. And we achieved that performance even at small sample sizes — as few as 500 samples, drawn at random from the probability distributions we stipulated.

Consider a causal graph, which represents factors contributing to the amount of inventory that a retailer has on hand. (This is a drastic simplification; the causal graphs for real-world inventory counts might have dozens of factors, rather than five.)

Each input-output relation in this network has an associated conditional probability distribution, or causal mechanism. The probabilities associated with the individual causal mechanisms determine the joint distribution of all the variables (X1-X5), or the probability that any given combination of variables will occur together. That in turn determines the probability distribution of the target variable — the amount of inventory on hand.

A large change to the final outcome may be accompanied by changes to all the causal mechanisms in the graph. Our technique identifies the causal mechanism whose change is most responsible for the change in outcome.

Our fundamental insight is that any given causal mechanism in the graph could, in principle, change without affecting the others. So given a causal graph, the initial causal mechanisms, and data that imply new causal mechanisms, we update the causal mechanisms one by one to determine the influence each has on the outcome.

The problem with this approach is that our measurement of each node’s contribution depends on the order in which we update the nodes. The measurement evaluates the consequences of changing the node’s causal mechanism given every possible value of the other variables in the graph. But the probabilities of those values change when we update causal mechanisms. So we’ll get different measurements, depending on which causal mechanisms have been updated.

To address this problem, we run through every permutation of the update order and average the per-node results, an adaptation of a technique from game theory called computing the Shapley value.

In practice, of course, causal mechanisms are something we have to infer from data; we’re not given probability distributions in advance. But to test our approach, we created a simple causal graph in which we could stipulate the distributions. Then, using those distributions, we generated data samples.

Across 100 different random changes to the causal mechanisms of our graph, our method performed very well; with 500 data samples per change, it achieved an average deviation from ground truth of 0.29 as measured by L1 distance. Our ground truth is at least a 3-D vector (6-D at most), with at least one component whose magnitude is at least one (five at most). Therefore, a 0.29 L1 distance in the worst case is still a relatively small distance from 1.

We tested different volumes of data samples, from 500 to 4,000, but adding more samples had little effect on the accuracy of the approximation.

Internally, we have also applied our technique to questions of supply chain management. For a particular family of products, we were able to identify the reasons for a steady decline in on-hand inventory during the pandemic, when that figure had held steady for the preceding year.

Research areas

## Related content

• Staff writer
July 17, 2024
Learning algorithms and reinforcement learning are areas of focus, while LLM-related research — on topics such as continual learning, hallucination mitigation, and privacy — remains well represented.
• July 31, 2024
Dependency graphs of business processes with constrained decoding can reduce API hallucinations and out-of-order executions.
• May 17, 2024
A novel loss function and a way to aggregate multimodal input data are key to dramatic improvements on some test data.

## Work with us

US, CA, Santa Clara
As a Senior Scientist at AWS AI/ML leading the Personalization and Privacy AI teams, you will have deep subject matter expertise in the areas of recommender systems, personalization, generative AI and privacy. You will provide thought leadership on and lead strategic efforts in the personalization of models to be used by customer applications across a wide range of customer use cases. Particular new directions regarding personalizing the output of LLM and their applications will be at the forefront. You will work with product, science and engineering teams to deliver short- and long-term personalization solutions that scale to large number of builders developing Generative AI applications on AWS. You will lead and work with multiple teams of scientists and engineers to translate business and functional requirements into concrete deliverables. Key job responsibilities You will be a hands on contributor to science at Amazon. You will help raise the scientific bar by mentoring, educating, and publishing in your field. You will help build the scientific roadmap for personalization, privacy and customization for generative AI. You will be a technical leader in your domain. You will be a strong mentor and lead for your team. About the team The DS3 org encompasses scientists who work closely with different AWS AI/ML product services, innovating on the behalf of our customers customers. AWS Utility Computing (UC) provides product innovations — from foundational services such as Amazon’s Simple Storage Service (S3) and Amazon Elastic Compute Cloud (EC2), to consistently released new product innovations that continue to set AWS’s services and features apart in the industry. As a member of the UC organization, you’ll support the development and management of Compute, Database, Storage, Internet of Things (Iot), Platform, and Productivity Apps services in AWS, including support for customers who require specialized security solutions for their cloud services. Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Hybrid Work We value innovation and recognize this sometimes requires uninterrupted time to focus on a build. We also value in-person collaboration and time spent face-to-face. Our team affords employees options to work in the office every day or in a flexible, hybrid work model near one of our U.S. Amazon offices.
US, WA, Seattle
Join us at the cutting edge of Amazon's sustainability initiatives to work on environmental and social advancements to support Amazon's long term worldwide sustainability strategy. At Amazon, we're working to be the most customer-centric company on earth. To get there, we need exceptionally talented, bright, and driven people. The Worldwide Sustainability (WWS) organization capitalizes on Amazon’s scale & speed to build a more resilient and sustainable company. We manage our social and environmental impacts globally, driving solutions that enable our customers, businesses, and the world around us to become more sustainable. Sustainability Science and Innovation (SSI) is a multi-disciplinary team within the WW Sustainability organization that combines science, analytics, economics, statistics, machine learning, product development, and engineering expertise. We use this expertise and skills to identify, develop and evaluate the science and innovations necessary for Amazon, customers and partners to meet their long-term sustainability goals and commitments. We’re seeking a Sr. Manager, Applied Scientist for Sustainability and Climate AI to drive technical strategy and innovation for our long-term sustainability and climate commitments through AI & ML. You will serve as the strategic technical advisor to science, emerging tech, and climate pledge partners operating at the Director, VPs, and SVP level. You will set the next generation modeling standards for the team and tackle the most immature/complex modeling problems following the latest sustainability/climate sciences. Staying hyper current with emergent sustainability/climate science and machine learning trends, you'll be trusted to translate recommendations to leadership and be the voice of our interpretation. You will nurture a continuous delivery culture to embed informed, science-based decision-making into existing mechanisms, such as decarbonization strategies, ESG compliance, and risk management. You will also have the opportunity to collaborate with the Climate Pledge team to define strategies based on emergent science/tech trends and influence investment strategy. As a leader on this team, you'll play a key role in worldwide sustainability organizational planning, hiring, mentorship and leadership development. If you see yourself as a thought leader and innovator at the intersection of climate science and tech, we’d like to connect with you. About the team Diverse Experiences: World Wide Sustainability (WWS) values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Inclusive Team Culture: It’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth: We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance: We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why flexible work hours and arrangements are part of our culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve.
US, WA, Seattle
Does the idea of setting the strategic direction for the product ontology that supports Amazon stores sound exciting? Would it be your dream job to generate, curate and manage product knowledge highlighting all of Amazon's mammoth selection and services from door knobs to books to dishwasher installation to things that haven’t even been invented yet? Do you want to help use data to make finding and understanding Amazon's product space easier? The vision of the Product Knowledge Ontology Team is to provide a standardized, semantically rich, easily discoverable, extensible, and universally applicable body of product knowledge that can be consistently utilized across customer shopping experiences, selling partner listing experiences, and product catalog enrichment. As a Principal Research Scientist you will lead the design and build world-class, intuitive, and comprehensive taxonomy and ontology solutions to optimize product discovery and classification. Key job responsibilities - Work with Product Knowledge leadership team to set strategic direction for ontology platform development - Design and create knowledge models that leverage cutting-edge technology to meet the needs of Amazon customers - Influence across a broad set of internal and external team stakeholders (engineers, designers, program and business leaders) while delivering impactful results for both manufacturers and customers - Evangelize the powerful solutions that ontologies can to offer to solve common and complex business problems - Use Generative Artificial Intelligence (generative AI) models to solve complex schema management use cases at scale - Analyze knowledge performance metrics, customer behavior data and industry trends to make intelligent data-driven decisions on how we can evolve the ontology to provide the best data for customers and internal users - Own business requirements related to knowledge management tools, metrics and processes - Identify and execute the right trade-offs for internal and external customers and systems operating on the ontology - Support a broad community of knowledge builders across Amazon by participating in knowledge sharing and mentorship
US, VA, Arlington
AWS Industry Products (AIP) is an AWS engineering organization chartered to build new AWS products by applying Amazon’s innovation mechanisms along with AWS digital technologies to transform the world, industry by industry. We dive deep with leaders and innovators to solve the problems which block their industries, enabling them to capitalize on new digital business models. Simply put, our goal is to use the skill and scale of AWS to make the benefits of a connected world achievable for all businesses. We are looking for Research Scientists who are passionate about transforming industries through AI. This is a unique opportunity to not only listen to industry customers but also to develop AI and generative AI expertise in multiple core industries. You will join a team of scientists, product managers and software engineers that builds AI solutions in automotive, manufacturing, healthcare, sustainability/clean energy, and supply chain/operations verticals. Leveraging and advancing generative AI technology will be a big part of your charter as we seek to apply the latest advancements in generative AI to industry-specific problems Using your in-depth expertise in machine learning and generative AI and software engineering, you will take the lead on tactical and strategic initiatives to deliver reusable science components and services that differentiate our industry products and solve customer problems. You will be the voice of scientific rigor, delivery, and innovation as you work with our segment teams on AI-driven product differentiators. You will conduct and advance research in AI and generative AI within and outside Amazon. Extensive knowledge of both state-of-the-art and emerging AI methods and technologies is expected. Hands-on knowledge of generative AI, foundation models and commitment to learn and grow in this field are expected. Prior research or industry experience in Sustainability would be a plus. About the team Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Hybrid Work We value innovation and recognize this sometimes requires uninterrupted time to focus on a build. We also value in-person collaboration and time spent face-to-face. Our team affords employees options to work in the office every day or in a flexible, hybrid work model near one of our U.S. Amazon offices.
US, CA, San Francisco
The Artificial General Intelligence (AGI) team is looking for a highly skilled and experienced Senior Applied Scientist, to lead the development and implementation of cutting-edge algorithms and models to automate workflows, processes for browser automation, developers and operations teams. As part of this, we are developing services and inference engine for these automation agents; and techniques for reasoning, planning, and modeling workflows. As a Senior Applied Scientist, you will play a critical role in driving the development of Generative AI (GenAI) technologies that can handle Amazon-scale use cases and have a significant impact on our customers' experiences. Key job responsibilities - Develop cutting edge multimodal Large Language Models (LLMs) to observe, model and derive insights from manual workflows for automation - Work in a joint scrum with engineers for rapid invention, develop cutting edge automation agent systems, and take them to launch for millions of customers - Collaborate with cross-functional teams of engineers, product managers, and scientists to identify and solve complex problems in GenAI - Design and execute experiments to evaluate the performance of different algorithms and models, and iterate quickly to improve results - Think big about the arc of development of GenAI over a multi-year horizon, and identify new opportunities to apply these technologies to solve real-world problems - Communicate results and insights to both technical and non-technical audiences, including through presentations and written reports - Mentor and guide junior scientists and engineers, and contribute to the overall growth and development of the team
US, WA, Seattle
By applying to this position, your application will be considered for all locations we hire for in the United States. Are you interested in machine learning, deep learning, automated reasoning, speech, robotics, computer vision, optimization, or quantum computing? We are looking for applied scientists capable of using a variety of domain expertise to invent, design, evangelize, and implement state-of-the-art solutions for never-before-solved problems. Our full-time opportunities are available in, but are not limited to the following domains: • Machine Learning: You will put Machine Learning theory into practice through experimentation and invention, leveraging machine learning techniques (such as random forest, Bayesian networks, ensemble learning, clustering, etc.), and implement learning systems to work on massive datasets in an effort to tackle never-before-solved problems. • Automated Reasoning: AWS Automated Reasoning teams deliver tools that are called billions of times daily. Amazon development teams are integrating automated-reasoning tools such as Dafny, P, and SAW into their development processes, raising the bar on the security, durability, availability, and quality of our products. Areas of work include: Distributed proof search, SAT and SMT solvers, Reasoning about distributed systems, Automating regulatory compliance, Program analysis and synthesis, Security and privacy, Cryptography, Static analysis, Property-based testing, Model-checking, Deductive verification, compilation into mainstream programming languages, Automatic test generation, and Static and dynamic methods for concurrent systems. • Natural Language Processing and Speech Technologies: You will tackle some of the most interesting research problems on the leading edge of natural language processing. We are hiring in all areas of spoken language understanding: NLP, NLU, ASR, text-to-speech (TTS), and more! • Computer Vision and Robotics: You will help build solutions where visual input helps the customers shop, anticipate technological advances, work with leading edge technology, focus on highly targeted customer use-cases, and launch products that solve problems for our customers. • Quantum: Quantum computing is rapidly emerging and our customers can the see the potential it has to address their challenges. One of our missions at AWS is to give customers access to the most innovative technology available and help them continuously reinvent their business. Quantum computing is a technology that holds promise to be transformational in many industries. We are adding quantum computing resources to the toolkits of every researcher and developer. If this sounds exciting to you - come build the future with us! Key job responsibilities You will have access to large datasets with billions of images and video to build large-scale systems Analyze and model terabytes of text, images, and other types of data to solve real-world problems and translate business and functional requirements into quick prototypes or proofs of concept Own the design and development of end-to-end systems Write technical white papers, create technical roadmaps, and drive production level projects that will support Amazon Web Services Work closely with AWS scientists to develop solutions and deploy them into production Work with diverse groups of people and cross-functional teams to solve complex business problems
US, WA, Seattle
Our mission is to create best-in-class AI agents that seamlessly integrate multimodal inputs like speech, images, and video, enabling natural, empathetic, and adaptive interactions. We develop cutting-edge Large Language Models (LLMs) that leverage advanced architectures, cross-modal learning, interpretability, and responsible AI techniques to provide coherent, context-aware responses augmented by real-time knowledge retrieval. We seek a talented Applied Scientist with expertise in LLMs, speech, audio, NLP, or multimodal learning to pioneer innovations in data simulation, representation, model pre-training/fine-tuning, generation, reasoning, retrieval, and evaluation. The ideal candidate will build scalable solutions for a variety of applications, such as streaming real-time conversational experiences, including multilingual support, talking avatar interactions, customizable personalities, and conversational turn-taking. With a passion for pushing boundaries and rapid experimentation, you'll deliver high-impact solutions from research to customer-facing products and services. Key job responsibilities As an Applied Scientist, you'll leverage your expertise to research novel algorithms and modeling techniques to develop data simulation approaches mimicking real-world interactions with a focus on the speech modality. You'll acquire and curate large, diverse datasets while ensuring privacy, creating robust evaluation metrics and test sets to comprehensively assess LLM performance. Integrating human-in-the-loop feedback, you'll iterate on data selection, sampling, and enhancement techniques to improve the core model performance. Your innovations in data representation, model pre-training/fine-tuning on simulated and real-world datasets, and responsible AI practices will directly impact customers through new AI products and services.
US, WA, Seattle
Our mission is to create best-in-class AI agents that seamlessly integrate multimodal inputs like speech, images, and video, enabling natural, empathetic, and adaptive interactions. We develop cutting-edge Large Language Models (LLMs) that leverage advanced architectures, cross-modal learning, interpretability, and responsible AI techniques to provide coherent, context-aware responses augmented by real-time knowledge retrieval. We seek a talented Applied Scientist with expertise in LLMs, speech, audio, NLP, or multimodal learning to pioneer innovations in data simulation, representation, model pre-training/fine-tuning, generation, reasoning, retrieval, and evaluation. The ideal candidate will build scalable solutions for a variety of applications, such as streaming real-time conversational experiences, including multilingual support, talking avatar interactions, customizable personalities, and conversational turn-taking. With a passion for pushing boundaries and rapid experimentation, you'll deliver high-impact solutions from research to customer-facing products and services. Key job responsibilities As an Applied Scientist, you'll leverage your expertise to research novel algorithms and modeling techniques to develop data simulation approaches mimicking real-world interactions with a focus on the speech modality. You'll acquire and curate large, diverse datasets while ensuring privacy, creating robust evaluation metrics and test sets to comprehensively assess LLM performance. Integrating human-in-the-loop feedback, you'll iterate on data selection, sampling, and enhancement techniques to improve the core model performance. Your innovations in data representation, model pre-training/fine-tuning on simulated and real-world datasets, and responsible AI practices will directly impact customers through new AI products and services.
US, NY, New York