How automated reasoning improves the Prime Video experience
In a pilot study, an automated code checker found about 100 possible errors, 80% of which turned out to require correction.
Automated reasoning is the ability of machines to make logical deductions. One common application of automated reasoning is in software verification, or establishing that computer programs will do what they’re supposed to. While this has been an area of active research for five decades, it is only very recently that verification techniques have become applicable to industrial code bases with millions of lines of code.
Since 2019, we in the Prime Video Automated Reasoning group have been creating software development tools that use these verification techniques to give Amazon developers greater confidence in the code they write for the Prime Video App, the app that controls video playback for all Prime Video content.
The Prime Video App provides a uniform end-user experience regardless of the type of content, from movies on demand to live streaming of major sports events. Because the application has multiple components developed by dozens of independent teams worldwide in a range of programming languages, and it has to run on thousands of different hardware configurations, it provides a particularly tough environment for automated reasoning.
Since March 26, 2021, all Prime Video developers have been using an automated-reasoning bot called BugBear, which conducts automatic code reviews, providing clear and actionable comments when it detects potential issues — or confirming that no issues were found. BugBear has analyzers for C/C++, Java, and TypeScript, the main languages used in Prime Video.
In a pilot campaign that started in the second half of 2020, BugBear conducted more than 1,000 automated code reviews and identified potential issues on about 100 of them. Prime Video developers agreed that approximately 80% of those issues required code modification.
BugBear is implemented entirely using Amazon Web Services, and it provides feedback on developers’ code within 15 minutes. It can identify both generic issues and violations of Prime-Video-specific business-logic properties, something that we call code contracts.
A generic issue is a problem in the code that is always considered wrong, irrespective of what the application is supposed to do — for instance, not closing a file after opening it, or trying to use the value of a variable that has not been previously initialized.
A code contract is usually a restriction on possible behaviors of the code or a specific business rule that needs to be implemented; think of rules such as “before enabling a touch-screen keyboard, check that the device supports a touch screen as an input mechanism”.
Bugbear’s analyzers are static-analysis tools, meaning that they do not need to execute the code to find issues. In the case of C/C++ and Java source code, we employ an existing tool called Infer, adapted for Prime Video, to detect generic issues such as memory and concurrency problems — problems that arise when multiple processes operate in parallel on shared variables.
For C++ and TypeScript contracts, we have developed our own tool that encodes the program under analysis as a database of logical facts. Using a custom-built set of rules, it can deduce properties of the program automatically. We employ the declarative programming language Datalog to represent both the facts and the rules.
Suppose, for instance, that a contract requires that, in function F, the function open_resource should always be called before the function use_resource(). In Datalog, the corresponding rule might look like this:
The key issue here is the construction of the relation called_before, which imposes constraints on the shape of the so-called call graph, a graphical representation of possible sequences of calls through the program’s instruction set.
The construction of the call graph is an undecidable problem, meaning that it’s impossible to construct a fully accurate, finite graph that correctly encodes all function calls in the program. Consequently, we can evaluate our rule only over approximations of the call graph. The precision of these approximations and their performance have been among the main topics of our research to deliver BugBear.
In ongoing work, we are building a system that would enable users to prove assertions on-demand in code review. For instance, users will be able to write “BugBear assertions” in their code, and these will be analyzed automatically by our tools. These assertions will also be used to scale the analysis, as we will be able to focus only on the code relevant to proving the assertion.