Amazon EBS addresses the challenge of the CAP Theorem at scale

Optimizing placement of configuration data ensures that it’s available and consistent during “network partitions”.

Amazon Elastic Block Store (EBS) is a high-performance, cloud-based block storage system designed to work with Amazon Elastic Compute Cloud (EC2). EC2 instances are secure, resizable allotments of computing capacity running in the cloud. EBS allows customers to create high-performance storage volumes and attach them to their EC2 instances. These volumes behave much like local hard drives on your PC.

Just as Amazon Web Services (AWS) builds new services that customers use directly, such as AWS Outposts and Graviton2 instances, we are also constantly investing in the distributed microsystems within the architectures of our services. In a paper my colleagues and I are presenting this month at the USENIX Symposium on Networked Systems Design and Implementation (NSDI ’20), we describe an EBS-specialized data store we call Physalia, which helps AWS data centers recover from communications interruptions, with minimal customer impact.

Physalia is one of the ways we’ve continued to innovate in EBS, which has been delivering block storage for AWS customers for more than 10 years.

Each EC2 instance runs in an Availability Zone, which is one or more data centers with redundant power, networking, and connectivity. EBS volumes in an Availability Zone are distributed across a number of storage servers and are architected to handle network partitions, or impairments in communication links between EBS servers and EC2 instances (such as a damaged optical cable running between two data centers).

EBS maintains availability during partitions through replication technology. EBS stores each piece of data on multiple servers, using a fault-tolerant replication protocol. When a network partition occurs, affected servers contact a distributed service called the configuration master.

The master stores a small amount of configuration data indicating which servers hold the data for a given volume and the order in which they replicate it, which is important for identifying up-to-date data. The replication protocol uses the configuration data to decide where application data should be stored, and it updates the configuration to point to the application data’s new location. Physalia is designed to play the role of the configuration master.

Data stores have two chief reliability criteria: availability and consistency. Availability means that every query to the database should result in an answer in a reasonable amount of time. Consistency means that the results of database reads and writes should reflect the order in which they were issued. If user A writes data to a location, and user B then retrieves data from the same location, B should retrieve what A wrote.

Circumventing CAP

A central theorem in network theory, called the CAP theorem, holds that in the face of a partition (the P in CAP), it’s possible to solve for either consistency or availability but not both. EBS needs to solve for both, and Physalia is able to offer strong statistical assurances of consistency and availability by combining a knowledge of data center and power topology and an architecture that uses the concept of the cell as a logical unit.

With Physalia, every EBS volume in an Availability Zone has its own cell, which consists of seven copies of the configuration data for that volume on seven separate servers. We call each copy of the configuration data a node, and a single physical server will typically store thousands of nodes.

Physalia cell
A Physalia cell, consisting of seven nodes (N’s), each of which can communicate with all the others.
Stacy Reilly

Availability is very important to AWS customers, and it’s part of how we design and build. When we think about availability, we're focused not only on making interruptions infrequent and short but also on limiting the impact to as small a subset of customers as possible. We call this idea “blast radius reduction”, and it is a core design tenet for Physalia. Physalia reduces blast radius by placing configuration data close to the servers that need it, when they need it.

In deciding where to place a given node, Physalia faces two competing demands. On the one hand, the nodes should be close together, to minimize the risk that they’ll be cut off from each other in the case of a network partition. On the other hand, they shouldn’t be too close together — or share a power supply — because you wouldn’t want a localized incident such as a rack failure to affect the entire cell. Our knowledge of the network topology and the power topology of the data center helps Physalia manage this trade-off in real time.

By keeping cells small and local, we can ensure that they are available to the instances and volumes that need them. Maintaining consistency within the cell ensures that the configuration data is accurate, and volume data is not corrupted.

The small size of the cells means, essentially, that Physalia is not a single database but a collection of millions of tiny databases. That’s the title of our paper: “Millions of Tiny Databases.” The assurance of cell-level consistency also explains Physalia’s name. The Portuguese man o’ war, Physalia physalis, is, contrary to appearance, not a single organism but a collection of organisms functioning together symbiotically.

In addition to evaluating Physalia empirically, we also verified its correctness using formal methods, a powerful tool that helps AWS researchers design, test, and verify systems at all scales. In the specification language TLA+, we defined the operational parameters of the Physalia system and mathematically verified that it reduces blast radius during network partitions, even in extremely unlikely edge cases.

Research areas

Related content

GB, London
Amazon Advertising is looking for a Data Scientist to join its brand new initiative that powers Amazon’s contextual advertising products. Advertising at Amazon is a fast-growing multi-billion dollar business that spans across desktop, mobile and connected devices; encompasses ads on Amazon and a vast network of hundreds of thousands of third party publishers; and extends across US, EU and an increasing number of international geographies. The Supply Quality organization has the charter to solve optimization problems for ad-programs in Amazon and ensure high-quality ad-impressions. We develop advanced algorithms and infrastructure systems to optimize performance for our advertisers and publishers. We are focused on solving a wide variety of problems in computational advertising like traffic quality prediction (robot and fraud detection), Security forensics and research, Viewability prediction, Brand Safety, Contextual data processing and classification. Our team includes experts in the areas of distributed computing, machine learning, statistics, optimization, text mining, information theory and big data systems. We are looking for a dynamic, innovative and accomplished Data Scientist to work on data science initiatives for contextual data processing and classification that power our contextual advertising solutions. Are you an experienced user of sophisticated analytical techniques that can be applied to answer business questions and chart a sustainable vision? Are you exited by the prospect of communicating insights and recommendations to audiences of varying levels of technical sophistication? Above all, are you an innovator at heart and have a track record of resolving ambiguity to deliver result? As a data scientist, you help our data science team build cutting edge models and measurement solutions to power our contextual classification technology. As this is a new initiative, you will get an opportunity to act as a thought leader, work backwards from the customer needs, dive deep into data to understand the issues, define metrics, conceptualize and build algorithms and collaborate with multiple cross-functional teams. Key job responsibilities * Define a long-term science vision for contextual-classification tech, driven fundamentally from the needs of our advertisers and publishers, translating that direction into specific plans for the science team. Interpret complex and interrelated data points and anecdotes to build and communicate this vision. * Collaborate with software engineering teams to Identify and implement elegant statistical and machine learning solutions * Oversee the design, development, and implementation of production level code that handles billions of ad requests. Own the full development cycle: idea, design, prototype, impact assessment, A/B testing (including interpretation of results) and production deployment. * Promote the culture of experimentation and applied science at Amazon. * Demonstrated ability to meet deadlines while managing multiple projects. * Excellent communication and presentation skills working with multiple peer groups and different levels of management * Influence and continuously improve a sustainable team culture that exemplifies Amazon’s leadership principles. We are open to hiring candidates to work out of one of the following locations: London, GBR
JP, 13, Tokyo
We are seeking a Principal Economist to be the science leader in Amazon's customer growth and engagement. The wide remit covers Prime, delivery experiences, loyalty program (Amazon Points), and marketing. We look forward to partnering with you to advance our innovation on customers’ behalf. Amazon has a trailblazing track record of working with Ph.D. economists in the tech industry and offers a unique environment for economists to thrive. As an economist at Amazon, you will apply the frontier of econometric and economic methods to Amazon’s terabytes of data and intriguing customer problems. Your expertise in building reduced-form or structural causal inference models is exemplary in Amazon. Your strategic thinking in designing mechanisms and products influences how Amazon evolves. In this role, you will build ground-breaking, state-of-the-art econometric models to guide multi-billion-dollar investment decisions around the global Amazon marketplaces. You will own, execute, and expand a research roadmap that connects science, business, and engineering and contributes to Amazon's long term success. As one of the first economists outside North America/EU, you will make an outsized impact to our international marketplaces and pioneer in expanding Amazon’s economist community in Asia. The ideal candidate will be an experienced economist in empirical industrial organization, labour economics, or related structural/reduced-form causal inference fields. You are a self-starter who enjoys ambiguity in a fast-paced and ever-changing environment. You think big on the next game-changing opportunity but also dive deep into every detail that matters. You insist on the highest standards and are consistent in delivering results. Key job responsibilities - Work with Product, Finance, Data Science, and Data Engineering teams across the globe to deliver data-driven insights and products for regional and world-wide launches. - Innovate on how Amazon can leverage data analytics to better serve our customers through selection and pricing. - Contribute to building a strong data science community in Amazon Asia. We are open to hiring candidates to work out of one of the following locations: Tokyo, 13, JPN
DE, BE, Berlin
Ops Integration: Concessions team is looking for a motivated, creative and customer obsessed Snr. Applied Scientist with a strong machine learning background, to develop advanced analytics models (Computer Vision, LLMs, etc.) that improve customer experiences We are the voice of the customer in Amazon’s operations, and we take that role very seriously. If you join this team, you will be a key contributor to delivering the Factory of the Future: leveraging Internet of Things (IoT) and advanced analytics to drive tangible, operational change on the ground. You will collaborate with a wide range of stakeholders (You will partner with Research and Applied Scientists, SDEs, Technical Program Managers, Product Managers and Business Leaders) across the business to develop and refine new ways of assessing challenges within Amazon operations. This role will combine Amazon’s oldest Leadership Principle, with the latest analytical innovations, to deliver business change at scale and efficiently The ideal candidate will have deep and broad experience with theoretical approaches and practical implementations of vision techniques for task automation. They will be a motivated self-starter who can thrive in a fast-paced environment. They will be passionate about staying current with sensing technologies and algorithms in the broader machine vision industry. They will enjoy working in a multi-disciplinary team of engineers, scientists and business leaders. They will seek to understand processes behind data so their recommendations are grounded. Key job responsibilities Your solutions will drive new system capabilities with global impact. You will design highly scalable, large enterprise software solutions involving computer vision. You will develop complex perception algorithms integrating across multiple sensing devices. You will develop metrics to quantify the benefits of a solution and influence project resources. You will validate system performance and use insights from your live models to drive the next generation of model development. Common tasks include: • Research, design, implement and evaluate complex perception and decision making algorithms integrating across multiple disciplines • Work closely with software engineering teams to drive scalable, real-time implementations • Collaborate closely with team members on developing systems from prototyping to production level • Collaborate with teams spread all over the world • Track general business activity and provide clear, compelling management reports on a regular basis We are open to hiring candidates to work out of one of the following locations: Berlin, BE, DEU | Berlin, DEU
DE, BY, Munich
Ops Integration: Concessions team is looking for a motivated, creative and customer obsessed Snr. Applied Scientist with a strong machine learning background, to develop advanced analytics models (Computer Vision, LLMs, etc.) that improve customer experiences We are the voice of the customer in Amazon’s operations, and we take that role very seriously. If you join this team, you will be a key contributor to delivering the Factory of the Future: leveraging Internet of Things (IoT) and advanced analytics to drive tangible, operational change on the ground. You will collaborate with a wide range of stakeholders (You will partner with Research and Applied Scientists, SDEs, Technical Program Managers, Product Managers and Business Leaders) across the business to develop and refine new ways of assessing challenges within Amazon operations. This role will combine Amazon’s oldest Leadership Principle, with the latest analytical innovations, to deliver business change at scale and efficiently The ideal candidate will have deep and broad experience with theoretical approaches and practical implementations of vision techniques for task automation. They will be a motivated self-starter who can thrive in a fast-paced environment. They will be passionate about staying current with sensing technologies and algorithms in the broader machine vision industry. They will enjoy working in a multi-disciplinary team of engineers, scientists and business leaders. They will seek to understand processes behind data so their recommendations are grounded. Key job responsibilities Your solutions will drive new system capabilities with global impact. You will design highly scalable, large enterprise software solutions involving computer vision. You will develop complex perception algorithms integrating across multiple sensing devices. You will develop metrics to quantify the benefits of a solution and influence project resources. You will validate system performance and use insights from your live models to drive the next generation of model development. Common tasks include: • Research, design, implement and evaluate complex perception and decision making algorithms integrating across multiple disciplines • Work closely with software engineering teams to drive scalable, real-time implementations • Collaborate closely with team members on developing systems from prototyping to production level • Collaborate with teams spread all over the world • Track general business activity and provide clear, compelling management reports on a regular basis We are open to hiring candidates to work out of one of the following locations: Munich, BE, DEU | Munich, BY, DEU | Munich, DEU
IT, MI, Milan
Ops Integration: Concessions team is looking for a motivated, creative and customer obsessed Snr. Applied Scientist with a strong machine learning background, to develop advanced analytics models (Computer Vision, LLMs, etc.) that improve customer experiences We are the voice of the customer in Amazon’s operations, and we take that role very seriously. If you join this team, you will be a key contributor to delivering the Factory of the Future: leveraging Internet of Things (IoT) and advanced analytics to drive tangible, operational change on the ground. You will collaborate with a wide range of stakeholders (You will partner with Research and Applied Scientists, SDEs, Technical Program Managers, Product Managers and Business Leaders) across the business to develop and refine new ways of assessing challenges within Amazon operations. This role will combine Amazon’s oldest Leadership Principle, with the latest analytical innovations, to deliver business change at scale and efficiently The ideal candidate will have deep and broad experience with theoretical approaches and practical implementations of vision techniques for task automation. They will be a motivated self-starter who can thrive in a fast-paced environment. They will be passionate about staying current with sensing technologies and algorithms in the broader machine vision industry. They will enjoy working in a multi-disciplinary team of engineers, scientists and business leaders. They will seek to understand processes behind data so their recommendations are grounded. Key job responsibilities Your solutions will drive new system capabilities with global impact. You will design highly scalable, large enterprise software solutions involving computer vision. You will develop complex perception algorithms integrating across multiple sensing devices. You will develop metrics to quantify the benefits of a solution and influence project resources. You will validate system performance and use insights from your live models to drive the next generation of model development. Common tasks include: • Research, design, implement and evaluate complex perception and decision making algorithms integrating across multiple disciplines • Work closely with software engineering teams to drive scalable, real-time implementations • Collaborate closely with team members on developing systems from prototyping to production level • Collaborate with teams spread all over the world • Track general business activity and provide clear, compelling management reports on a regular basis We are open to hiring candidates to work out of one of the following locations: Milan, MI, ITA
ES, M, Madrid
Ops Integration: Concessions team is looking for a motivated, creative and customer obsessed Snr. Applied Scientist with a strong machine learning background, to develop advanced analytics models (Computer Vision, LLMs, etc.) that improve customer experiences We are the voice of the customer in Amazon’s operations, and we take that role very seriously. If you join this team, you will be a key contributor to delivering the Factory of the Future: leveraging Internet of Things (IoT) and advanced analytics to drive tangible, operational change on the ground. You will collaborate with a wide range of stakeholders (You will partner with Research and Applied Scientists, SDEs, Technical Program Managers, Product Managers and Business Leaders) across the business to develop and refine new ways of assessing challenges within Amazon operations. This role will combine Amazon’s oldest Leadership Principle, with the latest analytical innovations, to deliver business change at scale and efficiently The ideal candidate will have deep and broad experience with theoretical approaches and practical implementations of vision techniques for task automation. They will be a motivated self-starter who can thrive in a fast-paced environment. They will be passionate about staying current with sensing technologies and algorithms in the broader machine vision industry. They will enjoy working in a multi-disciplinary team of engineers, scientists and business leaders. They will seek to understand processes behind data so their recommendations are grounded. Key job responsibilities Your solutions will drive new system capabilities with global impact. You will design highly scalable, large enterprise software solutions involving computer vision. You will develop complex perception algorithms integrating across multiple sensing devices. You will develop metrics to quantify the benefits of a solution and influence project resources. You will validate system performance and use insights from your live models to drive the next generation of model development. Common tasks include: • Research, design, implement and evaluate complex perception and decision making algorithms integrating across multiple disciplines • Work closely with software engineering teams to drive scalable, real-time implementations • Collaborate closely with team members on developing systems from prototyping to production level • Collaborate with teams spread all over the world • Track general business activity and provide clear, compelling management reports on a regular basis We are open to hiring candidates to work out of one of the following locations: Madrid, ESP | Madrid, M, ESP
US, TX, Austin
The role is available Arlington, Virginia (may consider New York, NY, Los Angeles, CA, or Toronto, Canada). Calling all inventors to work on exciting new opportunities in Sponsored Products. Amazon is building a world class advertising business and defining and delivering a collection of self-service performance advertising products that drive discovery and sales of merchandise. Our products are strategically important to our Retail and Marketplace businesses, driving long-term growth. Sponsored Products (SP) helps merchants, retail vendors, and brand owners grows incremental sales of their products sold on Amazon through native advertising. SP achieves this by using a combination of machine learning, big data analytics, ultra-low latency high-volume engineering systems, and quantitative product focus. We are a highly motivated, collaborative and fun-loving group with an entrepreneurial spirit and bias for action. You will join a newly-founded team with a broad mandate to experiment and innovate, which gives us the flexibility to explore and apply scientific techniques to novel product problems. You will have the satisfaction of seeing your work improve the experience of millions of Amazon shoppers while driving quantifiable revenue impact. More importantly, you will have the opportunity to broaden your technical skills, work with Generative AI, and be a science leader in an environment that thrives on creativity, experimentation, and product innovation. We are open to hiring candidates to work out of one of the following locations: Austin, TX, USA
GB, London
Ops Integration: Concessions team is looking for a motivated, creative and customer obsessed Snr. Applied Scientist with a strong machine learning background, to develop advanced analytics models (Computer Vision, LLMs, etc.) that improve customer experiences We are the voice of the customer in Amazon’s operations, and we take that role very seriously. If you join this team, you will be a key contributor to delivering the Factory of the Future: leveraging Internet of Things (IoT) and advanced analytics to drive tangible, operational change on the ground. You will collaborate with a wide range of stakeholders (You will partner with Research and Applied Scientists, SDEs, Technical Program Managers, Product Managers and Business Leaders) across the business to develop and refine new ways of assessing challenges within Amazon operations. This role will combine Amazon’s oldest Leadership Principle, with the latest analytical innovations, to deliver business change at scale and efficiently The ideal candidate will have deep and broad experience with theoretical approaches and practical implementations of vision techniques for task automation. They will be a motivated self-starter who can thrive in a fast-paced environment. They will be passionate about staying current with sensing technologies and algorithms in the broader machine vision industry. They will enjoy working in a multi-disciplinary team of engineers, scientists and business leaders. They will seek to understand processes behind data so their recommendations are grounded. Key job responsibilities Your solutions will drive new system capabilities with global impact. You will design highly scalable, large enterprise software solutions involving computer vision. You will develop complex perception algorithms integrating across multiple sensing devices. You will develop metrics to quantify the benefits of a solution and influence project resources. You will validate system performance and use insights from your live models to drive the next generation of model development. Common tasks include: • Research, design, implement and evaluate complex perception and decision making algorithms integrating across multiple disciplines • Work closely with software engineering teams to drive scalable, real-time implementations • Collaborate closely with team members on developing systems from prototyping to production level • Collaborate with teams spread all over the world • Track general business activity and provide clear, compelling management reports on a regular basis Basic Qualifications -Masters in Computer Science, Machine Learning, Robotics or equivalent with a focus on Computer Vision. -2+ years of experience of building machine learning models for business application -Broad knowledge of fundamentals and state of the art in computer vision and machine learning -Strong coding skills in two or more programming languages such as Python or C/C++ -Knowledge of fundamentals in optimization, supervised and reinforcement learning -Excellent problem-solving ability Preferred Qualifications -PhD and 4+ years of industry or academic applied research experience applying Computer Vision techniques and developing Computer vision algorithms -Depth and breadth in state-of-the-art computer vision and machine learning technologies and experience designing and building computer vision solutions -Industry experience in sensor systems and the development of production computer vision and machine learning applications built to use them -Experience developing software interfacing to AWS services -Excellent written and verbal communication skills with the ability to present complex technical information in a clear and concise manner to a variety of audiences -Ability to work on a diverse team or with a diverse range of coworkers -Experience in publishing at major Computer Vision, ML or Robotics conferences or Journals (CVPR, ICCV, ECCV, NeurIPS, ICML, IJCV, ICRA, IROS, RSS,...) We are open to hiring candidates to work out of one of the following locations: London, GBR
US, WA, Seattle
Want to work in a start-up environment with the resources of Amazon behind you? Do you want to have direct and immediate impact on millions of customers every day? If you are a self-starter, passionate about machine learning, deep learning, big data systems, enjoy designing and implementing new features and machine learned models, and intrigued by ambiguous problems, look no further. Amazon Advertising operates at the intersection of eCommerce and advertising, offering a rich array of digital display advertising solutions with the goal of helping our customers find and discover anything they want to buy. We help advertisers of all types to reach Amazon customers on Amazon.com, across our other owned and operated sites, on other high quality sites across the web, and on millions of Kindles, tablets, and mobile devices. We start with the customer and work backwards in everything we do, including advertising. If you’re interested in joining a rapidly growing team working to build a unique, world-class advertising group with a relentless focus on the customer, you’ve come to the right place. About Our Team: Our team is responsible for building a new advertising product for non-endemic advertisers. We are tasked with taking this start-up offering to market, with the goal of empowering over one million non-endemic advertisers to independently plan and execute campaigns. “Non-endemic” brands offer products and services that are not sold/available in Amazon’s retail marketplace, including restaurants, hotels, airlines, insurance, telecom, and automobiles. We are embarking on a multi-year vision to democratize display advertising for non-endemic advertisers at self-service scale. This will open up Amazon Ads to self-service non-endemic demand— whether they sell on the Amazon store or not— to activate Amazon Ads first-party audiences built from shopping and streaming signals and access unique ad inventory to help grow their business. Open to hire in NYC or Seattle. Key job responsibilities - Drive end-to-end Machine Learning projects that have a high degree of ambiguity, scale, complexity. - Perform hands-on analysis and modeling of enormous data sets to develop insights that increase traffic monetization and merchandise sales, without compromising the shopper experience. - Build machine learning models, perform proof-of-concept, experiment, optimize, and deploy your models into production; work closely with software engineers to assist in productionizing your ML models. - Run A/B experiments, gather data, and perform statistical analysis. - Establish scalable, efficient, automated processes for large-scale data analysis, machine-learning model development, model validation and serving. - Research new and innovative machine learning approaches. - Train and fine-tune neural models including transformers and language models. - Recruit Applied Scientists to the team and provide mentorship. We are open to hiring candidates to work out of one of the following locations: Seattle, WA, USA
LU, Luxembourg
Ops Integration: Concessions team is looking for a motivated, creative and customer obsessed Snr. Applied Scientist with a strong machine learning background, to develop advanced analytics models (Computer Vision, LLMs, etc.) that improve customer experiences We are the voice of the customer in Amazon’s operations, and we take that role very seriously. If you join this team, you will be a key contributor to delivering the Factory of the Future: leveraging Internet of Things (IoT) and advanced analytics to drive tangible, operational change on the ground. You will collaborate with a wide range of stakeholders (You will partner with Research and Applied Scientists, SDEs, Technical Program Managers, Product Managers and Business Leaders) across the business to develop and refine new ways of assessing challenges within Amazon operations. This role will combine Amazon’s oldest Leadership Principle, with the latest analytical innovations, to deliver business change at scale and efficiently The ideal candidate will have deep and broad experience with theoretical approaches and practical implementations of vision techniques for task automation. They will be a motivated self-starter who can thrive in a fast-paced environment. They will be passionate about staying current with sensing technologies and algorithms in the broader machine vision industry. They will enjoy working in a multi-disciplinary team of engineers, scientists and business leaders. They will seek to understand processes behind data so their recommendations are grounded. Key job responsibilities Your solutions will drive new system capabilities with global impact. You will design highly scalable, large enterprise software solutions involving computer vision. You will develop complex perception algorithms integrating across multiple sensing devices. You will develop metrics to quantify the benefits of a solution and influence project resources. You will validate system performance and use insights from your live models to drive the next generation of model development. Common tasks include: • Research, design, implement and evaluate complex perception and decision making algorithms integrating across multiple disciplines • Work closely with software engineering teams to drive scalable, real-time implementations • Collaborate closely with team members on developing systems from prototyping to production level • Collaborate with teams spread all over the world • Track general business activity and provide clear, compelling management reports on a regular basis We are open to hiring candidates to work out of one of the following locations: Luxembourg, LUX