# A gentle introduction to automated reasoning

## Meet Amazon Science’s newest research area.

This week, Amazon Science added automated reasoning to its list of research areas. We made this change because of the impact that automated reasoning is having here at Amazon. For example, Amazon Web Services’ customers now have direct access to automated-reasoning-based features such as IAM Access Analyzer, S3 Block Public Access, or VPC Reachability Analyzer. We also see Amazon development teams integrating automated-reasoning tools into their development processes, raising the bar on the security, durability, availability, and quality of our products.

```bool f(unsigned int x, unsigned int y) {
return (x+y == y+x);
}```

Take a few moments to answer the question “Could f ever return false?” This is not a trick question: I’ve purposefully used a simple example to make a point.

To check the answer with exhaustive testing, we could try executing the following doubly nested test loop, which calls f on all possible pairs of values of the type unsigned int:

```#include<stdio.h>
#include<stdbool.h>
#include<limits.h>

bool f(unsigned int x, unsigned int y) {
return (x+y == y+x);
}

void main() {
for (unsigned int x=0;1;x++) {
for (unsigned int y=0;1;y++) {
if (!f(x,y)) printf("Error!\n");
if (y==UINT_MAX) break;
}
if (x==UINT_MAX) break;
}
}```

Unfortunately, even on modern hardware, this doubly nested loop will run for a very long time. I compiled it and ran it on a 2.6 GHz Intel processor for over 48 hours before giving up.

Why does testing take so long? Because UINT_MAX is typically 4,294,967,295, there are 18,446,744,065,119,617,025 separate f calls to consider. On my 2.6 GHz machine, the compiled test loop called f approximately 430 million times a second. But to test all 18 quintillion cases at this performance, we would need over 1,360 years.

When we show the above code to industry professionals, they almost immediately work out that f can't return false as long as the underlying compiler/interpreter and hardware are correct. How do they do that? They reason about it. They remember from their school days that x + y can be rewritten as y + x and conclude that f always returns true.

Re:Invent 2021 keynote address by Peter DeSantis, senior vice president for utility computing at Amazon Web Services
Skip to 15:49 for a discussion of Amazon Web Services' work on automated reasoning.

An automated reasoning tool does this work for us: it attempts to answer questions about a program (or a logic formula) by using known techniques from mathematics. In this case, the tool would use algebra to deduce that x + y == y + x can be replaced with the simple expression true.

Automated-reasoning tools can be incredibly fast, even when the domains are infinite (e.g., unbounded mathematical integers rather than finite C ints). Unfortunately, the tools may answer “Don’t know” in some instances. We'll see a famous example of that below.

The science of automated reasoning is essentially focused on driving the frequency of these “Don’t know” answers down as far as possible: the less often the tools report "Don't know" (or time out while trying), the more useful they are.

Today’s tools are able to give answers for programs and queries where yesterday’s tools could not. Tomorrow’s tools will be even more powerful. We are seeing rapid progress in this field, which is why at Amazon, we are increasingly getting so much value from it. In fact, we see automated reasoning forming its own Amazon-style virtuous cycle, where more input problems to our tools drive improvements to the tools, which encourages more use of the tools.

A slightly more complex example. Now that we know the rough outlines of what automated reasoning is, the next small example gives a slightly more realistic taste of the sort of complexity that the tools are managing for us.

```void g(int x, int y) {
if (y > 0)
while (x > y)
x = x - y;
}```

Or, alternatively, consider a similar Python program over unbounded integers:

```def g(x, y):
assert isinstance(x, int) and isinstance(y, int)
if y > 0:
while x > y:
x = x - y```

Try to answer this question: “Does g always eventually return control back to its caller?”

When we show this program to industry professionals, they usually figure out the right answer quickly. A few, especially those who are aware of results in theoretical computer science, sometimes mistakenly think that we can't answer this question, with the rationale “This is an example of the halting problem, which has been proved insoluble”. In fact, we can reason about the halting behavior for specific programs, including this one. We’ll talk more about that later.

Here’s the reasoning that most industry professionals use when looking at this problem:

1. In the case where y is not positive, execution jumps to the end of the function g. That’s the easy case.
2. If, in every iteration of the loop, the value of the variable x decreases, then eventually, the loop condition x > y will fail, and the end of g will be reached.
3. The value of x always decreases only if y is always positive, because only then does the update to x (i.e., x = x - y) decrease x. But y’s positivity is established by the conditional expression, so x always decreases.

The experienced programmer will usually worry about underflow in the x = x - y command of the C program but will then notice that x > y before the update to x and thus cannot underflow.

If you carried out the three steps above yourself, you now have a very intuitive view of the type of thinking an automated-reasoning tool is performing on our behalf when reasoning about a computer program. There are many nitty-gritty details that the tools have to face (e.g., heaps, stacks, strings, pointer arithmetic, recursion, concurrency, callbacks, etc.), but there’s also decades of research papers on techniques for handling these and other topics, along with various practical tools that put these ideas to work.

The main takeaway is that automated-reasoning tools are usually working through the three steps above on our behalf: Item 1 is reasoning about the program’s control structure. Item 2 is reasoning about what is eventually true within the program. Item 3 is reasoning about what is always true in the program.

Note that configuration artifacts such as AWS resource policies, VPC network descriptions, or even makefiles can be thought of as code. This viewpoint allows us to use the same techniques we use to reason about C or Python code to answer questions about the interpretation of configurations. It’s this insight that gives us tools like IAM Access Analyzer or VPC Reachability Analyzer.

## An end to testing?

As we saw above when looking at f and g, automated reasoning can be dramatically faster than exhaustive testing. With tools available today, we can show properties of f or g in milliseconds, rather than waiting lifetimes with exhaustive testing.

Can we throw away our testing tools now and just move to automated reasoning? Not quite. Yes, we can dramatically reduce our dependency on testing, but we will not be completely eliminating it any time soon, if ever. Consider our first example:

```bool f(unsigned int x, unsigned int y) {
return (x + y == y + x);
}```

Recall the worry that a buggy compiler or microprocessor could in fact cause an executable program constructed from this source code to return false. We might also need to worry about the language runtime. For example, the C math library or the Python garbage collector might have bugs that cause a program to misbehave.

What’s interesting about testing, and something we often forget, is that it’s doing much more than just telling us about the C or Python source code. It’s also testing the compiler, the runtime, the interpreter, the microprocessor, etc. A test failure could be rooted in any of those tools in the stack.

Automated reasoning, in contrast, is usually applied to just one layer of that stack — the source code itself, or sometimes the compiler or the microprocessor. What we find so valuable about reasoning is it allows us to clearly define both what we do know and what we do not know about the layer under inspection.

Furthermore, the models of the surrounding environment (e.g., the compiler or the procedure calling our procedure) used by the automated-reasoning tool make our assumptions very precise. Separating the layers of the computational stack helps make better use of our time, energy, and money and the capabilities of the tools today and tomorrow.

Unfortunately, we will almost always need to make assumptions about something when using automated reasoning — for example, the principles of physics that govern our silicon chips. Thus, testing will never be fully replaced. We will want to perform end-to-end testing to try and validate our assumptions as best we can.

## An impossible program

I previously mentioned that automated-reasoning tools sometimes return “Don’t know” rather than “yes” or “no”. They also sometimes run forever (or time out), thus never returning an answer. Let’s look at the famous "halting problem" program, in which we know tools cannot return “yes” or “no”.

Imagine that we have an automated-reasoning API, called terminates, that returns “yes” if a C function always terminates or “no” when the function could execute forever. As an example, we could build such an API using the tool described here (shameless self-promotion of author’s previous work). To get the idea of what a termination tool can do for us, consider two basic C functions, g (from above),

```void g(int x, int y) {
if (y > 0)
while (x > y)
x = x - y;
}```

and g2:

```void g2(int x, int y) {
while (x > y)
x = x - y;
}```

For the reasons we have already discussed, the function g always returns control back to its caller, so terminates(g) should return true. Meanwhile, terminates(g2) should return false because, for example, g2(5, 0) will never terminate.

Now comes the difficult function. Consider h:

```void h() {
if terminates(h) while(1){}
}```

Notice that it's recursive. What’s the right answer for terminates(h)? The answer cannot be "yes". It also cannot be "no". Why?

Imagine that terminates(h) were to return "yes". If you read the code of h, you’ll see that in this case, the function does not terminate because of the conditional statement in the code of h that will execute the infinite loop while(1){}. Thus, in this case, the terminates(h) answer would be wrong, because h is defined recursively, calling terminates on itself.

Similarly, if terminates(h) were to return "no", then h would in fact terminate and return control to its caller, because the if case of the conditional statement is not met, and there is no else branch. Again, the answer would be wrong. This is why the “Don’t know” answer is actually unavoidable in this case.

The program h is a variation of examples given in Turing’s famous 1936 paper on decidability and Gödel’s incompleteness theorems from 1931. These papers tell us that problems like the halting problem cannot be “solved”, if bysolved” we mean that the solution procedure itself always terminates and answers either “yes” or “no” but never “Don’t know”. But that is not the definition of “solved” that many of us have in mind. For many of us, a tool that sometimes times out or occasionally returns “Don’t know” but, when it gives an answer, always gives the right answer is good enough.

This problem is analogous to airline travel: we know it’s not 100% safe, because crashes have happened in the past, and we are sure that they will happen in the future. But when you land safely, you know it worked that time. The goal of the airline industry is to reduce failure as much as possible, even though it’s in principle unavoidable.

To put that in the context of automated reasoning: for some programs, like h, we can never improve the tool enough to replace the "Don't know" answer. But there are many other cases where today's tools answer "Don't know", but future tools may be able to answer "yes" or "no". The modern scientific challenge for automated-reasoning subject-matter experts is to get the practical tools to return “yes” or “no” as often as possible. As an example of current work, check out CMU professor and Amazon scholar Marijn Heule and his quest to solve the Collatz termination problem.

Another thing to keep in mind is that automated-reasoning tools are regularly trying to solve “intractable” problems, e.g., problems in the NP complexity class. Here, the same thinking applies that we saw in the case of the halting problem: automated-reasoning tools have powerful heuristics that often work around the intractability problem for specific cases, but those heuristics can (and sometimes do) fail, resulting in “Don’t know” answers or impractically long execution time. The science is to improve the heuristics to minimize that problem.

## Nomenclature

A host of names are used in the scientific literature to describe interrelated topics, of which automated reasoning is just one. Here’s a quick glossary:

• logic is a formal and mechanical system for defining what is true and untrue. Examples: propositional logic or first-order logic.
• theorem is a true statement in logic. Example: the four-color theorem.
• proof is a valid argument in logic of a theorem. Example: Gonthier's proof of the four-color theorem
• mechanical theorem prover is a semi-automated-reasoning tool that checks a machine-readable expression of a proof often written down by a human. These tools often require human guidance. Example: HOL-light, from Amazon researcher John Harrison
• Formal verification is the use of theorem proving when applied to models of computer systems to prove desired properties of the systems. Example: the CompCert verified C compiler
• Formal methods is the broadest term, meaning simply the use of logic to reason formally about models of systems.
• Automated reasoning focuses on the automation of formal methods.
• semi-automated-reasoning tool is one that requires hints from the user but still finds valid proofs in logic.

As you can see, we have a choice of monikers when working in this space. At Amazon, we’ve chosen to use automated reasoning, as we think it best captures our ambition for automation and scale. In practice, some of our internal teams use both automated and semi-automated reasoning tools, because the scientists we've hired can often get semi-automated reasoning tools to succeed where the heuristics in fully automated reasoning might fail. For our externally facing customer features, we currently use only fully automated approaches.

## Next steps

In this essay, I’ve introduced the idea of automated reasoning, with the smallest of toy programs. I haven’t described how to handle realistic programs, with heap or concurrency. In fact, there are a wide variety of automated-reasoning tools and techniques, solving problems in all kinds of different domains, some of them quite narrow. To describe them all and the many branches and sub-disciplines of the field (e.g. “SMT solving”, “higher-order logic theorem proving”, “separation logic”) would take thousands of blogs posts and books.

Automated reasoning goes back to the early inventors of computers. And logic itself (which automated reasoning attempts to solve) is thousands of years old. In order to keep this post brief, I’ll stop here and suggest further reading. Note that it’s very easy to get lost in the weeds reading depth-first into this area, and you could emerge more confused than when you started. I encourage you to use a bounded depth-first search approach, looking sequentially at a wide variety of tools and techniques in only some detail and then moving on, rather than learning only one aspect deeply.

### A fun deep track:

Some algorithms found in the automated theorem provers we use today date as far back as 1959, when Hao Wang used automated reasoning to prove the theorems from Principia Mathematica.

Research areas

## Related content

• June 01, 2023
Both secure multiparty computation and differential privacy protect the privacy of data used in computation, but each has advantages in different contexts.
• May 24, 2023
How ARA recipient Supreeth Shashikumar is using machine learning to help hospitals detect sepsis — before it’s too late.
• May 23, 2023
Enforcing a hierarchical clustering of semantically related labels improves performance on rare “long-tail” classification categories.

## Work with us

US, WA, Seattle
US, WA, Seattle
RO, Iasi
Amazon’s mission is to be earth’s most customer-centric company and our team is the guardian of our customer’s privacy. Amazon SDO Privacy engineering operates in Austin – TX, US and Iasi, Bucharest – Romania. Our mission is to develop services which will enable every Amazon service operating with personal data to satisfy the privacy rights of Amazon customers. We are working backwards from the customers and world-wide privacy regulations, think long term, and propose solutions which will assure Amazon Privacy compliance. Our external customers are world-wide customers of Amazon Retail Website, Amazon B2B services (e.g. Seller central, App / Skill Developers), and Amazon Subsidiaries. Our internal customers are services within Amazon who operate with personal data, Legal Representatives, and Customer Service Agents. You can opt-in for being part of one of the existing or newly formed engineering teams who will contribute to Amazon mission to meet external customers’ privacy rights: Personal Data Classification, The Right to be forgotten, The right of access, or Digital Markets Act – The Right of Portability. The ideal candidate has a great passion for data and an insatiable desire to learn and innovate. A commitment to team work, hustle and strong communication skills (to both business and technical partners) are absolute requirements. Creating reliable, scalable, and high-performance products requires a sound understanding of the fundamentals of Computer Science and practical experience building large-scale distributed systems. Your solutions will apply to all of Amazon’s consumer and digital businesses including but not limited to Amazon.com, Alexa, Kindle, Amazon Go, Prime Video and more. Key job responsibilities As an data scientist on our team, you will apply the appropriate technologies and best practices to autonomously solve difficult problems. You'll contribute to the science solution design, run experiments, research new algorithms, and find new ways of optimizing customer experience. Besides theoretical analysis and innovation, you will work closely with talented engineers and ML scientists to put your algorithms and models into practice. You will collaborate with partner teams including engineering, PMs, data annotators, and other scientists to discuss data quality, policy, and model development. Your work will directly impact the trust customers place in Amazon Privacy, globally.
JP, 13, Tokyo
The JP Economics team is a central science team working across a variety of topics in the JP Retail business and beyond. We work closely with JP business leaders to drive change at Amazon. We focus on solving long-term, ambiguous and challenging problems, while providing advisory support to help solve short-term business pain points. Key topics include pricing, product selection, delivery speed, profitability, and customer experience. We tackle these issues by building novel economic/econometric models, machine learning systems, and high-impact experiments which we integrate into business, financial, and system-level decision making. Our work is highly collaborative and we regularly partner with JP- EU- and US-based interdisciplinary teams. In this role, you will build ground-breaking, state-of-the-art causal inference models to guide multi-billion-dollar investment decisions around the global Amazon marketplaces. You will own, execute, and expand a research roadmap that connects science, business, and engineering and contributes to Amazon's long term success. As one of the first economists outside North America/EU, you will make an outsized impact to our international marketplaces and pioneer in expanding Amazon’s economist community in Asia. The ideal candidate will be an experienced economist in empirical industrial organization, labour economics, econometrics, or related structural/reduced-form causal inference fields. You are a self-starter who enjoys ambiguity in a fast-paced and ever-changing environment. You think big on the next game-changing opportunity but also dive deep into every detail that matters. You insist on the highest standards and are consistent in delivering results. Key job responsibilities Work with Product, Finance, Data Science, and Data Engineering teams across the globe to deliver data-driven insights and products for regional and world-wide launches. Innovate on how Amazon can leverage data analytics to better serve our customers through selection and pricing. Contribute to building a strong data science community in Amazon Asia.
GB, London