Speeding database queries by rewriting redundancies

Amazon Athena reduces query execution time by 14% by eliminating redundant operations.

SQL database queries often include repetitions of the same operation. For instance, finding an entry in a table that corresponds to a particular person might involve pulling up all the entries with the person’s first name and all the entries with the person’s last name and computing their intersection. If the first- and last-name searches require querying the same database table twice, it’s a redundancy that can increase retrieval time.

In a paper we presented last week at the IEEE International Conference on Data Engineering (ICDE), we describe a method for rewriting complex SQL queries so as to eliminate such redundancies. Sometimes, that involves retrieving a superset of entries and then winnowing them down according to additional criteria. But in general, a little extra computation after retrieval is more efficient than multiple queries of the same table.

Athena rewrite.png
A query plan is the sequence of steps required to execute a SQL query. At left is a diagram depicting the standard query plan for a complex query in the TPC-DS dataset. At right is the much simpler query plan produced by the Amazon researchers' new rewriting rules.

In experiments on the TPC-DS benchmark database, with 3TB of data, our techniques improved the overall execution time on 99 queries by 14%, compared to the baseline. When restricted to those queries that are directly transformed by our rewrite rules, we observed 60% improvement in performance, with some queries executing more than six times as quickly.

Query rewrite

A query plan is a sequence of steps such as data scans and aggregations that are used to execute a query. Query plan optimization is the process of choosing the most efficient query plan from a large number of possible alternatives.

Related content
Two authors of Amazon Redshift research paper that will be presented at leading international forum for database researchers reflect on how far the first petabyte scale cloud data warehouse has advanced since it was announced ten years ago.

The focus of our work is to identify subqueries computing on overlapping data and fuse them into a single computation with compensating actions (post-retrieval computations) to reconstruct the original results. It does not require the subqueries to be syntactically the same or to produce the same output.

Consider the below query as an example:

WITH cte as (...complex_subquery...)
SELECT customer_id FROM cte WHERE fname = 'John'
UNION ALL
SELECT customer_id FROM cte WHERE lname = 'Smith'

The query uses the block cte twice in the FROM clause. This is suboptimal, especially if the duplicated computation is expensive. Our technique can identify such patterns and rewrite them. For instance, the above query becomes

WITH cte as (...complex_subquery...)
SELECT customer_id FROM cte, (VALUES (1), (2)) T(tag)
WHERE (fname = 'John' AND tag=1)
   OR (lname = 'Smith' AND tag=2)

Although the common expressions are not exactly the same (there are different filter conditions in the WHERE clause), we are able to rewrite the query into a fragment that generates a superset of the required rows and columns and handles the differences via compensating actions.

Note that in general, not all queries with repeated expressions can be rewritten by eliminating the duplicated work. However, beyond the query pattern shown here, there are several scenarios in which rewrites are applicable.

Building blocks of the rewriting rules

Related content
Amazon researchers describe new method for distributing database tables across servers.

Here are the primitives that will be used in our new query plan optimization rules. Specifically, we define a function Fuse that takes two input plans and returns either ⊥ (when fusion is not possible) or a 4-tuple fused result. If Fuse(P1, P2) = (P, M, L, R), then

  • P is the resulting fused plan. The schema of P includes all output columns in P1 and, optionally, additional output columns from P2.
  • M is a mapping from the output columns of P2 to output columns of P.
  • L and R are two filter conditions defined over the output columns of P to restore P1 and P2, respectively.

Semantically, we can reconstruct P1 and P2 as follows:

P1 = ProjectoutCols(P1)(FilterL(P))
P2 = ProjectM(outCols(P2))(FilterR(P))

where outCols(P) denotes the output columns of plan P.

Fuse is a recursive function that can handle different operators such as table scan, filter, projection, join, aggregation, and distinct aggregation.

Optimization rules

We have introduced several optimization rules that rewrite the query plan based on the primitives defined above. New rules can be added if we find prevalent enough patterns, and a semantically equivalent representation is available.

Related content
Optimizing placement of configuration data ensures that it’s available and consistent during “network partitions”.

  • GroupByJoinToWindow
    This rule transforms a common pattern in which an expression is aggregated and joined back to itself to obtain additional information on the aggregated rows. Intuitively, it is a calculation that extends an input relation with aggregates computed on a subset of columns. Window functions operate in this manner and can be used to rewrite the original pattern.
  • JoinOnKeys
    This rule addresses a common pattern in which similar subqueries, which return different views of the same data, are self-joined together. Because of the existence of keys, each row from the left matches with at most one row from the right. Therefore, we are extending each row that matches with columns from both sides.

  • UnionAllOnJoin
    This rule handles scenarios in which customers combine results of two computations that are very similar overall but differ on a single table (e.g., they union together some analytical insight applied over different fact tables).
  • UnionAll
    This rule corresponds to the example query in the preceding section. It is a common pattern that customers use to compute a common expression and then union non-necessarily disjoint subsets of the result with different projections.

The work presented in the paper is already used in production. It is worth noting that although Athena benefits from it, the same techniques are applicable to other database systems, since they do not require implementing new operators or execution models.

We are glad to see that, as a result, our customers are running their queries faster and, because of less data scanned, lowering their bills.

Research areas

Related content

US, WA, Seattle
To ensure a great internship experience, please keep these things in mind. This is a full time internship and requires an individual to work 40 hours a week for the duration of the internship. Amazon requires an intern to be located where their assigned team is. Amazon is happy to provide relocation and housing assistance if you are located 50 miles or further from the office location. Do you have a strong machine learning background and want to help build new speech and language technology? Amazon is looking for PhD students who are ready to tackle some of the most interesting research problems on the leading edge of natural language processing. We are hiring in all areas of spoken language understanding: NLP, NLU, ASR, text-to-speech (TTS), and more! A successful candidate will be a self-starter comfortable with ambiguity, strong attention to detail, and the ability to work in a fast-paced, ever-changing environment. As an Applied Science Intern, you will develop and implement novel scalable algorithms and modeling techniques to advance the state-of-the-art in technology areas at the intersection of ML, NLP, search, and deep learning. You will work side-by-side with global experts in speech and language to solve challenging groundbreaking research problems on production scale data. The ideal candidate must have the ability to work with diverse groups of people and cross-functional teams to solve complex business problems. Amazon has positions available for Natural Language Processing & Speech Intern positions in multiple locations across the United States. Amazon fundamentally believes that scientific innovation is essential to being the most customer-centric company in the world. Please visit our website to stay updated with the research our teams are working on: https://www.amazon.science/research-areas/conversational-ai-natural-language-processing
US, WA, Seattle
To ensure a great internship experience, please keep these things in mind. This is a full time internship and requires an individual to work 40 hours a week for the duration of the internship. Amazon requires an intern to be located where their assigned team is. Amazon is happy to provide relocation and housing assistance if you are located 50 miles or further from the office location. The Research team at Amazon works passionately to apply cutting-edge advances in technology to solve real-world problems. Do you have a strong machine learning background and want to help build new speech and language technology? Do you welcome the challenge to apply optimization theory into practice through experimentation and invention? Would you love to help us develop the algorithms and models that power computer vision services at Amazon, such as Amazon Rekognition, Amazon Go, Visual Search, etc? At Amazon we hire research science interns to work in a number of domains including Operations Research, Optimization, Speech Technologies, Computer Vision, Robotics, and more! As an intern, you will be challenged to apply theory into practice through experimentation and invention, develop new algorithms using mathematical programming techniques for complex problems, implement prototypes and work with massive datasets. Amazon has a culture of data-driven decision-making, and the expectation is that analytics are timely, accurate, innovative and actionable. Amazon Science gives insight into the company’s approach to customer-obsessed scientific innovation. Amazon fundamentally believes that scientific innovation is essential to being the most customer-centric company in the world. It’s the company’s ability to have an impact at scale that allows us to attract some of the brightest minds in artificial intelligence and related fields. Amazon Scientist use our working backwards method to enrich the way we live and work. For more information on the Amazon Science community please visit https://www.amazon.science.
US, WA, Seattle
To ensure a great internship experience, please keep these things in mind. This is a full time internship and requires an individual to work 40 hours a week for the duration of the internship. Amazon requires an intern to be located where their assigned team is. Amazon is happy to provide relocation and housing assistance if you are located 50 miles or further from the office location. The Research team at Amazon works passionately to apply cutting-edge advances in technology to solve real-world problems. Do you have a strong machine learning background and want to help build new speech and language technology? Do you welcome the challenge to apply optimization theory into practice through experimentation and invention? Would you love to help us develop the algorithms and models that power computer vision services at Amazon, such as Amazon Rekognition, Amazon Go, Visual Search, etc? At Amazon we hire research science interns to work in a number of domains including Operations Research, Optimization, Speech Technologies, Computer Vision, Robotics, and more! As an intern, you will be challenged to apply theory into practice through experimentation and invention, develop new algorithms using mathematical programming techniques for complex problems, implement prototypes and work with massive datasets. Amazon has a culture of data-driven decision-making, and the expectation is that analytics are timely, accurate, innovative and actionable. Amazon Science gives insight into the company’s approach to customer-obsessed scientific innovation. Amazon fundamentally believes that scientific innovation is essential to being the most customer-centric company in the world. It’s the company’s ability to have an impact at scale that allows us to attract some of the brightest minds in artificial intelligence and related fields. Amazon Scientist use our working backwards method to enrich the way we live and work. For more information on the Amazon Science community please visit https://www.amazon.science.
CA, ON, Toronto
To ensure a great internship experience, please keep these things in mind. This is a full time internship and requires an individual to work 40 hours a week for the duration of the internship. Amazon requires an intern to be located where their assigned team is. Amazon is happy to provide relocation and housing assistance if you are located 50 miles or further from the office location. Are you a Masters student interested in machine learning, natural language processing, computer vision, automated reasoning, or robotics? We are looking for skilled scientists capable of putting theory into practice through experimentation and invention, leveraging science techniques and implementing systems to work on massive datasets in an effort to tackle never-before-solved problems. A successful candidate will be a self-starter comfortable with ambiguity, strong attention to detail, and the ability to work in a fast-paced, ever-changing environment. As an Applied Science Intern, you will own the design and development of end-to-end systems. You’ll have the opportunity to create technical roadmaps, and drive production level projects that will support Amazon Science. You will work closely with Amazon scientists, and other science interns to develop solutions and deploy them into production. The ideal scientist must have the ability to work with diverse groups of people and cross-functional teams to solve complex business problems. Amazon Science gives insight into the company’s approach to customer-obsessed scientific innovation. Amazon fundamentally believes that scientific innovation is essential to being the most customer-centric company in the world. It’s the company’s ability to have an impact at scale that allows us to attract some of the brightest minds in artificial intelligence and related fields. Our scientists use our working backwards method to enrich the way we live and work. For more information on the Amazon Science community please visit https://www.amazon.science.
CA, ON, Toronto
To ensure a great internship experience, please keep these things in mind. This is a full time internship and requires an individual to work 40 hours a week for the duration of the internship. Amazon requires an intern to be located where their assigned team is. Amazon is happy to provide relocation and housing assistance if you are located 50 miles or further from the office location. Are you a PhD student interested in machine learning, natural language processing, computer vision, automated reasoning, or robotics? We are looking for skilled scientists capable of putting theory into practice through experimentation and invention, leveraging science techniques and implementing systems to work on massive datasets in an effort to tackle never-before-solved problems. A successful candidate will be a self-starter comfortable with ambiguity, strong attention to detail, and the ability to work in a fast-paced, ever-changing environment. As an Applied Science Intern, you will own the design and development of end-to-end systems. You’ll have the opportunity to create technical roadmaps, and drive production level projects that will support Amazon Science. You will work closely with Amazon scientists, and other science interns to develop solutions and deploy them into production. The ideal scientist must have the ability to work with diverse groups of people and cross-functional teams to solve complex business problems. Amazon Science gives insight into the company’s approach to customer-obsessed scientific innovation. Amazon fundamentally believes that scientific innovation is essential to being the most customer-centric company in the world. It’s the company’s ability to have an impact at scale that allows us to attract some of the brightest minds in artificial intelligence and related fields. Our scientists use our working backwards method to enrich the way we live and work. For more information on the Amazon Science community please visit https://www.amazon.science.
US, WA, Seattle
To ensure a great internship experience, please keep these things in mind. This is a full time internship and requires an individual to work 40 hours a week for the duration of the internship. Amazon requires an intern to be located where their assigned team is. Amazon is happy to provide relocation and housing assistance if you are located 50 miles or further from the office location. We are looking for Masters or PhD students excited about working on Automated Reasoning or Storage System problems at the intersection of theory and practice to drive innovation and provide value for our customers. AWS Automated Reasoning teams deliver tools that are called billions of times daily. Amazon development teams are integrating automated-reasoning tools such as Dafny, P, and SAW into their development processes, raising the bar on the security, durability, availability, and quality of our products. AWS Automated Reasoning teams are changing how computer systems built on top of the cloud are developed and operated. AWS Automated Reasoning teams work in areas including: Distributed proof search, SAT and SMT solvers, Reasoning about distributed systems, Automating regulatory compliance, Program analysis and synthesis, Security and privacy, Cryptography, Static analysis, Property-based testing, Model-checking, Deductive verification, compilation into mainstream programming languages, Automatic test generation, and Static and dynamic methods for concurrent systems. AWS Storage Systems teams manage trillions of objects in storage, retrieving them with predictable low latency, building software that deploys to thousands of hosts, achieving 99.999999999% (you didn’t read that wrong, that’s 11 nines!) durability. AWS storage services grapple with exciting problems at enormous scale. Amazon S3 powers businesses across the globe that make the lives of customers better every day, and forms the backbone for applications at all scales and in all industries ranging from multimedia to genomics. This scale and data diversity requires constant innovation in algorithms, systems and modeling. AWS Storage Systems teams work in areas including: Error-correcting coding and durability modeling, system and distributed system performance optimization and modeling, designing and implementing distributed, multi-tenant systems, formal verification and strong, practical assurances of correctness, bits-IOPS-Watts: the interplay between computation, performance, and energy, data compression - both general-purpose and domain specific, research challenges with storage media, both existing and emerging, and exploring the intersection between storage and quantum technologies. As an Applied Science Intern, you will work closely with Amazon scientists and other science interns to develop solutions and deploy them into production. The ideal scientist must have the ability to work with diverse groups of people and cross-functional teams to solve complex business problems. A successful candidate will be a self-starter with strong attention to detail and the ability to thrive in a fast-paced, ever-changing environment who is comfortable with ambiguity. Amazon believes that scientific innovation is essential to being the world’s most customer-centric company. Our ability to have impact at scale allows us to attract some of the brightest minds in Automated Reasoning and related fields. Our scientists work backwards to produce innovative solutions that delight our customers. Please visit https://www.amazon.science (https://www.amazon.science/) for more information.
US, WA, Seattle
To ensure a great internship experience, please keep these things in mind. This is a full time internship and requires an individual to work 40 hours a week for the duration of the internship. Amazon requires an intern to be located where their assigned team is. Amazon is happy to provide relocation and housing assistance if you are located 50 miles or further from the office location. We are looking for PhD students excited about working on Automated Reasoning or Storage System problems at the intersection of theory and practice to drive innovation and provide value for our customers. AWS Automated Reasoning teams deliver tools that are called billions of times daily. Amazon development teams are integrating automated-reasoning tools such as Dafny, P, and SAW into their development processes, raising the bar on the security, durability, availability, and quality of our products. AWS Automated Reasoning teams are changing how computer systems built on top of the cloud are developed and operated. AWS Automated Reasoning teams work in areas including: Distributed proof search, SAT and SMT solvers, Reasoning about distributed systems, Automating regulatory compliance, Program analysis and synthesis, Security and privacy, Cryptography, Static analysis, Property-based testing, Model-checking, Deductive verification, compilation into mainstream programming languages, Automatic test generation, and Static and dynamic methods for concurrent systems. AWS Storage Systems teams manage trillions of objects in storage, retrieving them with predictable low latency, building software that deploys to thousands of hosts, achieving 99.999999999% (you didn’t read that wrong, that’s 11 nines!) durability. AWS storage services grapple with exciting problems at enormous scale. Amazon S3 powers businesses across the globe that make the lives of customers better every day, and forms the backbone for applications at all scales and in all industries ranging from multimedia to genomics. This scale and data diversity requires constant innovation in algorithms, systems and modeling. AWS Storage Systems teams work in areas including: Error-correcting coding and durability modeling, system and distributed system performance optimization and modeling, designing and implementing distributed, multi-tenant systems, formal verification and strong, practical assurances of correctness, bits-IOPS-Watts: the interplay between computation, performance, and energy, data compression - both general-purpose and domain specific, research challenges with storage media, both existing and emerging, and exploring the intersection between storage and quantum technologies. As an Applied Science Intern, you will work closely with Amazon scientists and other science interns to develop solutions and deploy them into production. The ideal scientist must have the ability to work with diverse groups of people and cross-functional teams to solve complex business problems. A successful candidate will be a self-starter with strong attention to detail and the ability to thrive in a fast-paced, ever-changing environment who is comfortable with ambiguity. Amazon believes that scientific innovation is essential to being the world’s most customer-centric company. Our ability to have impact at scale allows us to attract some of the brightest minds in Automated Reasoning and related fields. Our scientists work backwards to produce innovative solutions that delight our customers. Please visit https://www.amazon.science (https://www.amazon.science/) for more information.
US, WA, Seattle
To ensure a great internship experience, please keep these things in mind. This is a full time internship and requires an individual to work 40 hours a week for the duration of the internship. Amazon requires an intern to be located where their assigned team is. Amazon is happy to provide relocation and housing assistance if you are located 50 miles or further from the office location. Help us develop the algorithms and models that power computer vision services at Amazon, such as Amazon Rekognition, Amazon Go, Visual Search, and more! We are combining computer vision, mobile robots, advanced end-of-arm tooling and high-degree of freedom movement to solve real-world problems at huge scale. As an intern, you will help build solutions where visual input helps the customers shop, anticipate technological advances, work with leading edge technology, focus on highly targeted customer use-cases, and launch products that solve problems for Amazon customers. A successful candidate will be a self-starter comfortable with ambiguity, strong attention to detail, and the ability to work in a fast-paced, ever-changing environment. You will own the design and development of end-to-end systems and have the opportunity to write technical white papers, create technical roadmaps, and drive production level projects that will support Amazon Science. You will work closely with Amazon scientists, and other science interns to develop solutions and deploy them into production. The ideal scientist must have the ability to work with diverse groups of people and cross-functional teams to solve complex business problems. Amazon Science gives insight into the company’s approach to customer-obsessed scientific innovation. Amazon fundamentally believes that scientific innovation is essential to being the most customer-centric company in the world. It’s the company’s ability to have an impact at scale that allows us to attract some of the brightest minds in artificial intelligence and related fields. Amazon Scientist use our working backwards method to enrich the way we live and work. For more information on the Amazon Science community please visit https://www.amazon.science
US, WA, Seattle
To ensure a great internship experience, please keep these things in mind. This is a full time internship and requires an individual to work 40 hours a week for the duration of the internship. Amazon requires an intern to be located where their assigned team is. Amazon is happy to provide relocation and housing assistance if you are located 50 miles or further from the office location. Help us develop the algorithms and models that power computer vision services at Amazon, such as Amazon Rekognition, Amazon Go, Visual Search, and more! We are combining computer vision, mobile robots, advanced end-of-arm tooling and high-degree of freedom movement to solve real-world problems at huge scale. As an intern, you will help build solutions where visual input helps the customers shop, anticipate technological advances, work with leading edge technology, focus on highly targeted customer use-cases, and launch products that solve problems for Amazon customers. A successful candidate will be a self-starter comfortable with ambiguity, strong attention to detail, and the ability to work in a fast-paced, ever-changing environment. You will own the design and development of end-to-end systems and have the opportunity to write technical white papers, create technical roadmaps, and drive production level projects that will support Amazon Science. You will work closely with Amazon scientists, and other science interns to develop solutions and deploy them into production. The ideal scientist must have the ability to work with diverse groups of people and cross-functional teams to solve complex business problems. Amazon Science gives insight into the company’s approach to customer-obsessed scientific innovation. Amazon fundamentally believes that scientific innovation is essential to being the most customer-centric company in the world. It’s the company’s ability to have an impact at scale that allows us to attract some of the brightest minds in artificial intelligence and related fields. Amazon Scientist use our working backwards method to enrich the way we live and work. For more information on the Amazon Science community please visit https://www.amazon.science
US, WA, Seattle
To ensure a great internship experience, please keep these things in mind. This is a full time internship and requires an individual to work 40 hours a week for the duration of the internship. Amazon requires an intern to be located where their assigned team is. Amazon is happy to provide relocation and housing assistance if you are located 50 miles or further from the office location. Are you a Masters or PhD student interested in machine learning? We are looking for skilled scientists capable of putting Machine Learning theory into practice through experimentation and invention, leveraging machine learning techniques (such as random forest, Bayesian networks, ensemble learning, clustering, etc.), and implementing learning systems to work on massive datasets in an effort to tackle never-before-solved problems. A successful candidate will be a self-starter comfortable with ambiguity, strong attention to detail, and the ability to work in a fast-paced, ever-changing environment. As an Applied Science Intern, you will own the design and development of end-to-end systems. You’ll have the opportunity to create technical roadmaps, and drive production level projects that will support Amazon Science. You will work closely with Amazon scientists, and other science interns to develop solutions and deploy them into production. The ideal scientist must have the ability to work with diverse groups of people and cross-functional teams to solve complex business problems. Amazon Science gives insight into the company’s approach to customer-obsessed scientific innovation. Amazon fundamentally believes that scientific innovation is essential to being the most customer-centric company in the world. It’s the company’s ability to have an impact at scale that allows us to attract some of the brightest minds in artificial intelligence and related fields. Our scientists use our working backwards method to enrich the way we live and work. For more information on the Amazon Science community please visit https://www.amazon.science.