Cloud and systems

Speeding database queries by rewriting redundancies

Amazon Athena reduces query execution time by 14% by eliminating redundant operations.

May 19, 2022

4 min read

SQL database queries often include repetitions of the same operation. For instance, finding an entry in a table that corresponds to a particular person might involve pulling up all the entries with the person’s first name and all the entries with the person’s last name and computing their intersection. If the first- and last-name searches require querying the same database table twice, it’s a redundancy that can increase retrieval time.

In a paper we presented last week at the IEEE International Conference on Data Engineering (ICDE), we describe a method for rewriting complex SQL queries so as to eliminate such redundancies. Sometimes, that involves retrieving a superset of entries and then winnowing them down according to additional criteria. But in general, a little extra computation after retrieval is more efficient than multiple queries of the same table.

Athena rewrite.png — A *query plan* is the sequence of steps required to execute a SQL query. At left is a diagram depicting the standard query plan for a complex query in the TPC-DS dataset. At right is the much simpler query plan produced by the Amazon researchers' new rewriting rules.

In experiments on the TPC-DS benchmark database, with 3TB of data, our techniques improved the overall execution time on 99 queries by 14%, compared to the baseline. When restricted to those queries that are directly transformed by our rewrite rules, we observed 60% improvement in performance, with some queries executing more than six times as quickly.

Query rewrite

A query plan is a sequence of steps such as data scans and aggregations that are used to execute a query. Query plan optimization is the process of choosing the most efficient query plan from a large number of possible alternatives.

Amazon Redshift re-invented research paper and photos of Rahul Pathak, vice president of analytics at AWS, and Ippokratis Pandis, AWS senior principal engineer

Two authors of Amazon Redshift research paper that will be presented at leading international forum for database researchers reflect on how far the first petabyte scale cloud data warehouse has advanced since it was announced ten years ago.

The focus of our work is to identify subqueries computing on overlapping data and fuse them into a single computation with compensating actions (post-retrieval computations) to reconstruct the original results. It does not require the subqueries to be syntactically the same or to produce the same output.

Consider the below query as an example:

WITH cte as (...complex_subquery...)
SELECT customer_id FROM cte WHERE fname = 'John'
UNION ALL
SELECT customer_id FROM cte WHERE lname = 'Smith'

The query uses the block cte twice in the FROM clause. This is suboptimal, especially if the duplicated computation is expensive. Our technique can identify such patterns and rewrite them. For instance, the above query becomes

WITH cte as (...complex_subquery...)
SELECT customer_id FROM cte, (VALUES (1), (2)) T(tag)
WHERE (fname = 'John' AND tag=1)
   OR (lname = 'Smith' AND tag=2)

Although the common expressions are not exactly the same (there are different filter conditions in the WHERE clause), we are able to rewrite the query into a fragment that generates a superset of the required rows and columns and handles the differences via compensating actions.

Note that in general, not all queries with repeated expressions can be rewritten by eliminating the duplicated work. However, beyond the query pattern shown here, there are several scenarios in which rewrites are applicable.

Building blocks of the rewriting rules

Optimization rules

We have introduced several optimization rules that rewrite the query plan based on the primitives defined above. New rules can be added if we find prevalent enough patterns, and a semantically equivalent representation is available.

Speeding database queries by rewriting redundancies

Amazon Athena reduces query execution time by 14% by eliminating redundant operations.

Query rewrite

Building blocks of the rewriting rules

Optimization rules

Related content

Work with us