Three challenges in machine-based reasoning

Translating from natural to structured language, defining truth, and definitive reasoning remain topics of central concern in automated reasoning, but Amazon Web Services’ new Automated Reasoning checks help address all of them.

Generative AI has made the past few years the most exhilarating time in my 30+-year career in the space of mechanized reasoning. Why? Because the computer industry and even the general public are now eager to talk about ideas that those of us working in logic have been passionate about for years. The challenges of language, syntax, semantics, validity, soundness, completeness, computational complexity, and even undecidability were previously too academic and obscure to be relevant to the masses. But all of that has changed. To those of you who are now discovering these topics: welcome! Step right in, we’re eager to work with you.

I thought it would be useful to share what I believe are the three most vexing aspects of making correct reasoning work in AI systems, e.g., generative-AI-based systems such as chatbots. The launch of the Automated-Reasoning-checks capability in Bedrock Guardrails was in fact motivated by these challenges. But we are far from done: due to the inherent difficulty of these problems, we as a community (and we on the Automated-Reasoning-checks team) will be working on these challenges for years to come.

Difficulty #1: Translating from natural to structured language

Humans usually communicate with imprecise and ambiguous language. Often, we are able to infer disambiguating detail from context. In some cases, when it really matters, we will try to clarify with each other (“did you mean to say... ?”). In other cases, even when we really should, we won’t.

This is often a source of confusion and conflict. Imagine that an employer defines eligibility for an employee HR benefit as “having a contract of employment of 0.2 full-time equivalent (FTE) or greater”. Suppose I tell you that I “spend 20% of my time at work, except when I took time off last year to help a family member recover from surgery”. Am I eligible for the benefit? When I said I “spend 20% of my time at work”, does that mean I am spending 20% of my working time, under the terms of a contract?

My statement has multiple reasonable interpretations, each with different outcomes for benefit eligibility. Something we do in Automated Reasoning checks is make multiple attempts to translate between the natural language and query predicates, using complementary approaches. This is a common interview technique: ask for the same information in different ways, and see if the facts stay consistent. In Automated Reasoning checks, we use solvers for formal logic systems to prove/disprove the equivalence of the different interpretations. If the translations differ at the semantic level, the application that uses Automated Reasoning checks can then ask for clarifications (e.g. “Can you confirm that you have a contract of employment for 20% of full time or greater?”).

Reasoningcheck-16x9.gif
Automated Reasoning checks use large language models to generate several possible translations of natural language into a formal language. Automated Reasoning checks flag discrepancies between the translations, which customers can resolve through natural-language interactions.

Difficulty #2: Defining truth

Something that never fails to amaze me is how difficult it is for groups of people to agree on the meanings of rules. Complex rules and laws often have subtle contradictions that can go unnoticed until someone tries to reach consensus on their interpretation. The United Kingdom’s Copyrights, Designs, and Patents Act of 1988, for example, contains an inherent contradiction: it defines copyrightable works as those stemming from an author’s original intellectual creation, while simultaneously offering protection to works that require no creative human input — an incoherence that is particularly glaring in this age of AI-generated works.

The second source of trouble is that we seem to always be changing our rules. The US federal government’s per-diem rates, for example, change annually, requiring constant maintenance of any system that depends on those values.

Finally, few people actually deeply understand all of the corner cases of the rules that they are supposed to abide by. Consider the question of wearing earphones while driving: In some US states (e.g., Alaska) it’s illegal; in some states (e.g., Florida) it’s legal to wear one earphone only; while in other states (e.g., Texas), it’s actually legal. In an informal poll, very few of my friends and colleagues were confident in their understanding of the legality of wearing headphones while driving in the place where they most recently drove a car.

Automated Reasoning checks address these challenges by helping customers define what the truth should be in their domains of interest — be they tax codes, HR policies, or other rule systems — and by providing mechanisms for refining those definitions over time, as the rules change. As generative-AI-based (GenAI-based) chatbots emerged, something that captured the imagination of many of us is the idea that complex rule systems could be made accessible to the general public through natural-language queries. Chatbots could in the future give direct and easy-to-understand answers to questions like “Can I make a U-turn when driving in Tokyo, Japan?”, and by addressing the challenge of defining truth, Automated Reasoning checks can help ensure that the answer is reliable.

ReasoningCheckUI-16x9.gif
The user interface for Automated Reasoning checks.

Difficulty #3: definitive reasoning

Imagine we have a set of rules (let’s call it R) and a statement (S) we want to verify. For example, R might be Singapore’s driving code, and S might be a question about U-turns at intersections in Singapore. We can encode R and S into Boolean logic, which computers understand, by combining Boolean variables in various ways.

Let’s say that encoding R and S needs just 500 bits — about 63 characters. This is a tiny amount of information! But even when our encoding of the rule system is small enough to fit in a text message, the number of scenarios we’d need to check is astronomical. In principle, we must consider all 2500 possible combinations before we can authoritatively declare S to be a true statement. A powerful computer today can perform hundreds of millions of operations in the time it takes you to blink. But even if we had all the computers in the world running at this blazing speed since the beginning of time, we still wouldn’t be close to checking all 2500 possibilities today.

Thankfully, the automated-reasoning community has developed a class of sophisticated tools, called SAT solvers, that make this type of combinatorial checking possible and remarkably fast in many (but not all) cases. Automated Reasoning checks make use of these tools when checking the validity of statements.

Unfortunately, not all problems can be encoded in a way that plays to the strengths of SAT solvers. For example, imagine a rule system has the provision “if every even number greater than 2 is the sum of two prime numbers, then the tax withholding rate is 30%; otherwise it’s 40%”. The problem is that to know the tax withholding rate, you need to know whether every even number greater than 2 is the sum of two prime numbers, and no one currently knows whether this is true. This statement is called Goldbach’s conjecture and has been an open problem since 1742. Still, while we don’t know the answer to Goldbach’s conjecture, we do know that it is either true or false, so we can definitively say that the tax withholding rate must be either 30% or 40%.

It's also fun to think about whether it’s possible for a customer of Automated Reasoning checks to define a policy that is contingent on the output of Automated Reasoning checks. For instance, could the policy encode the rule “access is allowed if and only if Automated Reasoning checks say it is not allowed”? Here, no correct answer is possible, because the rule has created a contradiction by referring recursively to its own checking procedure. The best we can possibly do is answer “Unknown” (which is, in fact, what Automated Reasoning checks will answer in this instance).

The fact that a tool such as Automated Reasoning checks can return neither “true” nor “false” to statements like this was first identified by Kurt Gödel in 1931. What we know from Gödel’s result is that systems like Automated Reasoning checks can’t be both consistent and complete, so they must choose one. We have chosen to be consistent.

These three difficulties — translating natural language into structured logic, defining truth in the context of ever changing and sometimes contradictory rules, and tackling the complexity of definitive reasoning — are more than mere technical hurdles we face when we try to build AI systems with sound reasoning. They are problems that are deeply rooted in both the limitations of our technology and the intricacies of human systems.

With the launch of Automated Reasoning checks in Bedrock Guardrails on August 6, 2025, we are tackling these challenges through a combination of complementary approaches: applying cross-checking methods to translate from ambiguous natural language to logical predicates, providing flexible frameworks to help customers develop and maintain rule systems, and employing sophisticated SAT solvers while carefully handling cases where definitive answers are not possible. As we work to improve the performance of the product on these challenges, we are not only advancing technology but also deepening our understanding of the fundamental questions that have shaped reasoning itself, from Gödel’s incompleteness theorem to the evolving nature of legal and policy frameworks.

Given our commitment to providing sound reasoning, the road ahead in the AI space is challenging. Challenge accepted!

Related content

US, CA, San Diego
Do you want to join an innovative team of scientists who use machine learning and statistical techniques to help Amazon provide the best customer experience by protecting Amazon customers from hackers and bad actors? Do you want to build advanced algorithmic systems that help manage the trust and safety of millions of customer every day? Are you excited by the prospect of analyzing and modeling terabytes of data and create state-of-art algorithms to solve real world problems? Do you like to innovate and simplify? If yes, then you may be a great fit to join the Amazon Account Integrity team. The Amazon Account Integrity team works to ensure that customers are protected from bad actors trying to access their accounts. Our greatest challenge is protecting customer trust without unjustly harming good customers. To strike the right balance, we invest in mechanisms which allow us to accurately identify and mitigate risk, and to quickly correct and learn from our mistakes. This strategy includes continuously evolving enforcement policies, iterating our Machine Learning risk models, and exercising high‐judgement decision‐making where we cannot apply automation. Key job responsibilities Use statistical and machine learning techniques to create scalable risk management systems Analyzing and understanding large amounts of Amazon’s historical business data for specific instances of risk or broader risk trends Design, development and evaluation of highly innovative models for risk management Working closely with software engineering teams to drive real-time model implementations and new feature creations Working closely with operations staff to optimize risk management operations Establishing scalable, efficient, automated processes for large scale data analyses, model development, model validation and model implementation Tracking general business activity and providing clear, compelling management reporting on a regular basis Research and implement novel machine learning and statistical approaches.
US, NY, New York
Are you passionate about conducting research to develop and grow leaders? Would you like to impact more than 1M Amazonians globally and improve the employee experience? If so, you should consider joining the People eXperience & Technology Central Science (PXTCS) team. Our goal is to be best and most diverse workforce in the world. PXTCS uses science, research, and technology to optimize employee experience and performance across the full employee lifecycle, from first contact through exit. We use economics, behavioral science, statistics, and machine learning to proactively identify mechanisms and process improvements which simultaneously improve Amazon and the lives, wellbeing, and the value of work to Amazonians. We are an interdisciplinary team that combines the talents of science and engineering to develop and deliver solutions that measurably achieve this goal. This individual should be skilled in core data science tools and methods, icnluding SQL, a statistical software package (e.g., R, Python, or Stata), inferential statistics, and proficient in machine learning. This person should also have strong business acumen to navigate complex, ambiguous business challenges — they should be adept at asking the right questions, knowing what methodologies to use (and why), efficiently analyzing massive datasets, and communicating results to multiple audiences (e.g., technical peers, functional teams, business leaders). In order to move quickly, deliver high-quality results, and adapt to ever-evolving business priorities, effective communication skills in research fundamentals (e.g., research design, measurement, statistics) will also be a must. Major responsibilities will include: - Managing the full life cycle of large-scale research initiatives across multiple business segments that impact leaders in our organization (i.e., develop strategy, gather requirements, manage, and execute) - Serving as a subject matter expert on a wide variety of topics related to research design, measurement, analysis - Working with internal partners and external stakeholders to evaluate research initiatives that provide bottom-line ROI and incremental improvements over time - Collaborating with a cross-functional team that has expertise in social science, machine learning, econometrics, psychometrics, natural language processing, forecasting, optimization, business intelligence, analytics, and policy evaluation - Ability to query and clean complex datasets from multiple sources, to funnel into advanced statistical analysis - Writing high-quality, evidence-based documents that help provide insights to business leaders and gain buy-in - Sharing knowledge, advocating for innovative solutions, and mentoring others Inclusive Team Culture Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have 12 affinity groups (employee resource groups) with more than 1M employees across hundreds of chapters around the world. We have innovative benefit offerings, and host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 14 Leadership Principles, which reminds team members to seek diverse perspectives, learn and be curious, and earn trust. Flexibility It isn’t about which hours you spend at home or at work; it’s about the flow you establish that brings energy to both parts of your life. We offer flexibility and encourage you to find your own balance between your work and personal lives. Mentorship & Career Growth We care about your career growth, too. Whether your goals are to explore new technologies, take on bigger opportunities, or get to the next level, we'll help you get there. Our business is growing fast and our people will grow with it. About the team We are a collegial and multidisciplinary team of researchers in People eXperience and Technology (PXT) that combines the talents of science and engineering to develop innovative solutions to make Amazon Earth's Best Employer. We leverage data and rigorous analysis to help Amazon attract, retain, and develop one of the world’s largest and most talented workforces.
US, WA, Bellevue
The Mission of Amazon's Artificial General Intelligence (AGI) team is to "Build world-class general-purpose intelligence services that benefits every Amazon business and humanity." Are you a data enthusiast? Are you a creative big thinker who is passionate about using data to direct decision making and solve complex and large-scale challenges? If so, then this position is for you! We are looking for a motivated individual with strong analytical and communication skills to join us. In this role, you will apply advanced analytics techniques, AI/ML, and statistical concepts to derive insights from massive datasets. The ideal candidate should have expertise in AI/ML, statistical analysis, and the ability to write code for building models and pipelines to automate data and analytics processing. They will help us design experiments, build models, and develop appropriate metrics to deeply understand the strengths and weaknesses of our systems. They will build dashboards to automate data collection and reporting of relevant data streams, providing leadership and stakeholders with transparency into our system's performance. They will turn their findings into actions by writing detailed reports and providing recommendations on where we should focus our efforts to have the largest customer impact. A successful candidate should be a self-starter, comfortable with ambiguity with strong attention to detail, and have the ability to work in a fast-paced and ever-changing environment. They will also help coach/mentor junior scientists in the team. The ideal candidate should possess excellent verbal and written communication skills, capable of effectively communicating results and insights to both technical and non-technical audiences
US, WA, Bellevue
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Applied Scientist to work on methodologies for Generative Artificial Intelligence (GenAI) models. As an Applied Scientist, you will be responsible for supporting the development of novel algorithms and modeling techniques to advance the state of the art. Your work will directly impact our customers and will leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate development with multi-modal Large Language Models (LLMs) and GenAI. You will have significant influence on our overall strategy by working at the intersection of engineering and applied science to scale pre-training and post-training workflows and build efficient models. You will support the system architecture and the best practices that enable a quality infrastructure. Key job responsibilities Join us to work as an integral part of a team that has experience with GenAI models in this space. We work on these areas: - Pre-training and post-training multimodal LLMs - Scale training, optimization methods, and learning objectives - Utilize, build, and extend upon industry-leading frameworks - Work with other team members to investigate design approaches, prototype new technology, scientific techniques and evaluate technical feasibility - Deliver results independently in a self-organizing Agile environment while constantly embracing and adapting new scientific advances About the team The AGI team has a mission to push the envelope in GenAI with Large Language Models (LLMs) and multimodal systems, in order to provide the best-possible experience for our customers.
US, CA, Sunnyvale
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Principal Applied Scientist with a strong deep learning background, to lead the development of industry-leading technology with multimodal systems. As a Principal Applied Scientist, you are a trusted part of the technical leadership. You bring business and industry context to science and technology decisions. You set the standard for scientific excellence and make decisions that affect the way we build and integrate algorithms. You solicit differing views across the organization and are willing to change your mind as you learn more. Your artifacts are exemplary and often used as reference across organization. You are a hands-on scientific leader. Your solutions are exemplary in terms of algorithm design, clarity, model structure, efficiency, and extensibility. You tackle intrinsically hard problems, acquiring expertise as needed. You decompose complex problems into straightforward solutions. You amplify your impact by leading scientific reviews within your organization or at your location. You scrutinize and review experimental design, modeling, verification and other research procedures. You probe assumptions, illuminate pitfalls, and foster shared understanding. You align teams toward coherent strategies. You educate, keeping the scientific community up to date on advanced techniques, state of the art approaches, the latest technologies, and trends. You help managers guide the career growth of other scientists by mentoring and play a significant role in hiring and developing scientists and leads. Key job responsibilities You will be responsible for defining key research directions, adopting or inventing new machine learning techniques, conducting rigorous experiments, publishing results, and ensuring that research is translated into practice. You will develop long-term strategies, persuade teams to adopt those strategies, propose goals and deliver on them. You will also participate in organizational planning, hiring, mentorship and leadership development. You will be technically strong and with a passion for building scalable science and engineering solutions. You will serve as a key scientific resource in full-cycle development (conception, design, implementation, testing to documentation, delivery, and maintenance).
US, NY, New York
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the extreme. We focus on creating entirely new products and services with a goal of positively impacting the lives of our customers. No industries or subject areas are out of bounds. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have thirteen employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We are constantly learning through programs that are local, regional, and global. Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Our team highly values work-life balance, mentorship and career growth. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We care about your career growth and strive to assign projects and offer training that will challenge you to become your best. Key job responsibilities - Lead and execute complex, ambiguous research projects from ideation to production deployment - Drive technical strategy and roadmap decisions for ML/AI initiatives - Collaborate cross-functionally with product, engineering, and business teams to translate research into scalable products - Publish research findings at top-tier conferences and contribute to the broader scientific community - Establish best practices for ML experimentation, evaluation, and deployment
US, NY, New York
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the extreme. We focus on creating entirely new products and services with a goal of positively impacting the lives of our customers. No industries or subject areas are out of bounds. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have thirteen employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We are constantly learning through programs that are local, regional, and global. Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Our team highly values work-life balance, mentorship and career growth. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We care about your career growth and strive to assign projects and offer training that will challenge you to become your best. Key job responsibilities - Lead and execute complex, ambiguous research projects from ideation to production deployment - Drive technical strategy and roadmap decisions for ML/AI initiatives - Collaborate cross-functionally with product, engineering, and business teams to translate research into scalable products - Publish research findings at top-tier conferences and contribute to the broader scientific community - Establish best practices for ML experimentation, evaluation, and deployment
US, CA, Palo Alto
About Sponsored Products and Brands The Sponsored Products and Brands (SPB) team at Amazon Ads is re-imagining the advertising landscape through generative AI technologies, revolutionizing how millions of customers discover products and engage with brands across Amazon.com and beyond. We are at the forefront of re-inventing advertising experiences, bridging human creativity with artificial intelligence to transform every aspect of the advertising lifecycle from ad creation and optimization to performance analysis and customer insights. We are a passionate group of innovators dedicated to developing responsible and intelligent AI technologies that balance the needs of advertisers, enhance the shopping experience, and strengthen the marketplace. If you're energized by solving complex challenges and pushing the boundaries of what's possible with AI, join us in shaping the future of advertising. About our team SPB Ad Response Prediction team is your choice, if you want to join a highly motivated, collaborative, and fun-loving team with a strong entrepreneurial spirit and bias for action. We are seeking an experienced and motivated Applied Scientist with machine learning engineering background who loves to innovate at the intersection of customer experience, deep learning, and high-scale machine learning systems. We are looking for a talented Applied Scientist with a strong background in machine learning engineering to join our team and help us grow the business. In this role, you will partner with a team of engineers and scientists to build advanced machine learning models and infrastructure, from training to inference, including emerging LLM-based systems, that deliver highly relevant ads to shoppers across all Amazon platforms and surfaces worldwide. Key job responsibilities As a Sr Applied Scientist, you will: * Develop scalable and effective machine learning models and optimization strategies to solve business problems. * Conduct research on new machine learning modeling to optimize all aspects of Sponsored Products business. * Enhance the scalability, automation, and efficiency of large-scale training and real-time inference systems. * Pioneer the development of LLM inference infrastructure to support next-generation GenAI workloads at Amazon Ads scale.
US, CA, Sunnyvale
As a Principal Applied Scientist within the Artificial General Intelligence (AGI) organization, you are a trusted part of the technical leadership. You bring business and industry context to science and technology decisions, set the standard for scientific excellence, and make decisions that affect the way we build and integrate algorithms. A Principal Applied Scientist will solicit differing views across the organization and are willing to change your mind as you learn more. Your artifacts are exemplary and often used as reference across organization. You are a hands-on scientific leader; develop solutions that are exemplary in terms of algorithm design, clarity, model structure, efficiency, and extensibility; and tackle intrinsically hard problems acquiring expertise as needed. Principal Applied Scientists are expected to decompose complex problems into straightforward solutions. You will amplify your impact by leading scientific reviews within your organization or at your location; and scrutinize and review experimental design, modeling, verification and other research procedures. You will also probe assumptions, illuminate pitfalls, and foster shared understanding; align teams toward coherent strategies; and educate keeping the scientific community up to date on advanced techniques, state of the art approaches, the latest technologies, and trends. AGI Principal Applied Scientists help managers guide the career growth of other scientists by mentoring and play a significant role in hiring and developing scientists and leads. You will play a critical role in driving the development of Generative AI (GenAI) technologies that can handle Amazon-scale use cases and have a significant impact on our customers' experiences. Key job responsibilities You will be responsible for defining key research directions, inventing new machine learning techniques, conducting rigorous experiments, and ensuring that research is translated into practice. You will also develop long-term strategies, persuade teams to adopt those strategies, propose goals and deliver on them. A Principal Applied Scientist will participate in organizational planning, hiring, mentorship and leadership development. You will build scalable science and engineering solutions, and serve as a key scientific resource in full-cycle development (conception, design, implementation, testing to documentation, delivery, and maintenance).
US, CA, Sunnyvale
Our mission is to create a best-in-class Conversational AI that is intuitive, intelligent, and responsive, by developing superior Large Language Models (LLM) solutions and services which increase the capabilities built into the model and which enable utilizing thousands of APIs and external knowledge sources to provide the best experience for each request across millions of customers and endpoints. We are looking for a passionate, talented, and resourceful Applied Scientist in the field of LLM, Artificial Intelligence (AI), Natural Language Processing (NLP), Recommender Systems and/or Information Retrieval, to invent and build scalable solutions for a state-of-the-art context-aware conversational AI. A successful candidate will have strong machine learning background and a desire to push the envelope in one or more of the above areas. The ideal candidate would also have hands-on experiences in building Generative AI solutions with LLMs, enjoy operating in dynamic environments, be self-motivated to take on challenging problems to deliver big customer impact, moving fast to ship solutions and then iterating on user feedback and interactions. Key job responsibilities As a Senior Applied Scientist, you will leverage your technical expertise and experience to collaborate with other talented applied scientists and engineers to research and develop novel algorithms and modeling techniques to reduce friction and enable natural and contextual conversations. You will analyze, understand and improve user experiences by leveraging Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in artificial intelligence. You will work on core LLM technologies, including Supervised Fine-Tuning (SFT), In-Context Learning (ICL), Learning from Human Feedback (LHF), etc. Your work will directly impact our customers in the form of novel products and services .