Lessons learned from 10 years of DynamoDB

Prioritizing predictability over efficiency, adapting data partitioning to traffic, and continuous verification are a few of the principles that help ensure stability, availability, and efficiency.

Amazon DynamoDB is one of the most popular NoSQL database offerings on the Internet, designed for simplicity, predictability, scalability, and reliability. To celebrate DynamoDB’s 10th anniversary, the DynamoDB team wrote a paper describing lessons we’d learned in the course of expanding a fully managed cloud-based database system to hundreds of thousands of customers. The paper was presented at this year’s USENIX ATC conference.

The paper captures the following lessons that we have learned over the years:

  • Designing systems for predictability over absolute efficiency improves system stability. While components such as caches can improve performance, they should not introduce bimodality, in which the system has two radically different ways of responding to similar requests (e.g., one for cache misses and one for cache hits). Consistent behaviors ensure that the system is always provisioned to handle the unexpected. 
  • Adapting to customers’ traffic patterns to redistribute data improves customer experience. 
  • Continuously verifying idle data is a reliable way to protect against both hardware failures and software bugs in order to meet high durability goals. 
  • Maintaining high availability as a system evolves requires careful operational discipline and tooling. Mechanisms such as formal proofs of complex algorithms, game days (chaos and load tests), upgrade/downgrade tests, and deployment safety provide the freedom to adjust and experiment with the code without the fear of compromising correctness. 
Related content
Amazon DynamoDB was introduced 10 years ago today; one of its key contributors reflects on its origins, and discusses the 'never-ending journey' to make DynamoDB more secure, more available and more performant.

Before we dig deeper into these topics, a little terminology. A DynamoDB table is a collection of items (e.g., products), and each item is a collection of attributes (e.g., name, price, category, etc.). Each item is uniquely identified by its primary key. In DynamoDB, tables are typically partitioned, or divided into smaller sub-tables, which are assigned to nodes. A node is a set of dedicated computational resources — a virtual machine — running on a single server in a datacenter.

DynamoDB stores three copies of each partition, in different availability zones. This makes the partition highly available and durable because the availability zones’ storage resources share nothing and are substantially independent. For instance, we wouldn’t assign a partition and one of its copies to nodes that share a power supply, because a power outage would take both of them offline. The three copies of the same partition are known as a replication group, and there is a leader for the group that is responsible for replicating all the customer mutations and serving strongly consistent reads.

DynamoDB architecture.png
The DynamoDB architecture, including a request router, the partition metadata system, and storage nodes in different availability zones (AZs).

Those definitions in hand, let’s turn to our lessons learned.

Predictability over absolute efficiency

DynamoDB employs a lot of metadata caches in order to reduce latency. One of those caches stores the routing metadata for data requests. This cache is deployed on a fleet of thousands of request routers, DynamoDB’s front-end service.

In the original implementation, when the request router received the first request for a table, it downloaded the routing information for the entire table and cached it locally. Since the configuration information about partition replicas rarely changed, the cache hit rate was approximately 99.75%.

Related content
How Alexa scales machine learning models to millions of customers.

This was an amazing hit rate. However, on the flip side, the fallback mechanism for this cache was to hit the metadata table directly. When the cache becomes ineffective, the metadata table needs to instantaneously scale from handling 0.25% of requests to 100%. The sudden increase in traffic can cause the metadata table to fail, causing cascading failure in other parts of the system. To mitigate against such failures, we redesigned our caches to behave predictably.

First, we built an in-memory datastore called MemDS, which significantly reduced request routers’ and other metadata clients’ reliance on local caches. MemDS stores all the routing metadata in a highly compressed manner and replicates it across a fleet of servers. MemDS scales horizontally to handle all incoming requests to DynamoDB.

Second, we deployed a new local cache that avoids the bimodality of the original cache. All requests, even if satisfied by the local cache, are asynchronously sent to the MemDS. This ensures that the MemDS fleet is always serving a constant volume of traffic, regardless of cache hit or miss. The regular exercise of the fallback code helps prevent surprises during fallback.

DDB-MemDS.png
DynamoDB architecture with MemDS.

Unlike conventional local caches, MemDS sees traffic that is proportional to the customer traffic seen by the service; thus, during cache failures, it does not see a sudden amplification of traffic. Doing constant work removed the need for complex logic to handle edge cases around cache misses and reduced the reliance on local caches, improving system stability.

Reshaping partitioning based on traffic

Partitions offer a way to dynamically scale both the capacity and performance of tables. In the original DynamoDB release, customers explicitly specified the throughput that a table required in terms of read capacity units (RCUs) and write capacity units (WCUs). The original system assigned partitions to nodes based on both available space and computational capacity.

Related content
Optimizing placement of configuration data ensures that it’s available and consistent during “network partitions”.

As the demands on a table changed (because it grew in size or because the load increased), partitions could be further split to allow the table to scale elastically. Partition abstraction proved really valuable and continues to be central to the design of DynamoDB.

However, the early version of DynamoDB assigned both space and capacity to individual partitions on the basis of size, evenly distributing computational resources across table entries. This led to challenges of “hot partitions” and throughput dilution.

Hot partitions happened because customer workloads were not uniformly distributed and kept hitting a subset of items. Throughput dilution happened when partitions that had been split to handle increased load ended up with so few keys that they could quickly max out their meager allocated capacity.

Our initial response to these challenges was to add bursting and adaptive capacity (along with other features such as split for consumption) to DynamoDB. This line of work also led to the launch of on-demand tables.

Bursting is a way to absorb temporal spikes in workloads at a partition level. It’s based on the observation that not all partitions hosted by a storage node use their allocated throughput simultaneously.

Related content
Amazon researchers describe new method for distributing database tables across servers.

The idea is to let applications tap into unused capacity at a partition level on a best-effort basis to absorb short-lived spikes. DynamoDB still maintains workload isolation by ensuring that a partition can burst only if there is unused throughput at the node level.

DynamoDB also launched adaptive capacity to handle long-lived spikes that cannot be absorbed by the burst capacity. Adaptive capacity monitors traffic patterns and repartitions tables so that heavily accessed items reside on different nodes.

Both bursting and adaptive capacity had limitations, however. Bursting was helpful only for short-lived spikes in traffic, and it was dependent on nodes’ having enough throughput to support it. Adaptive capacity was reactive and kicked in only after transmission rates had been throttled down to avoid overloads.

To address these limitations, the DynamoDB team replaced adaptive capacity with global admission control (GAC). GAC builds on the idea of token buckets, in which bandwidth is allocated to network nodes as tokens, and the nodes “cash in” tokens in order to transmit data. Each request router maintains a local token bucket and communicates with GAC to replenish tokens at regular intervals (on the order of every few seconds). For an extra layer of defense, DynamoDB also uses token buckets at the partition level.

Continuous verification 

To provide durability and crash recovery, DynamoDB uses write-ahead logs, which record data writes before they occur. In the event of a crash, DynamoDB can use the write-ahead logs to reconstruct lost data writes, bringing partitions up to date.

Write-ahead logs are stored in all three replicas of a partition. For higher durability, the write-ahead logs are periodically archived to S3, an object store that is designed for more than 99.99% (in fact, 11 nines) durability. Each replica contains the most recent write-ahead logs, which are usually waiting to be archived. The unarchived logs are typically a few hundred megabytes in size.

Storage replica vs. log replica.png
Healing a storage replica by copying the B-tree can take several minutes, while adding a log replica, which takes only a few seconds, ensures that there is no impact on durability.

DynamoDB continuously verifies data at rest. Our goal is to detect any silent data errors or “bit rot” — bit errors caused by degradation of the storage medium. An example of continuous verification is the scrub process.

The scrub process verifies two things: that all three copies in a replication group have the same data and that the live replicas match a reference replica built offline using the archived write-ahead-log entries.

The verification is done by computing the checksum of the live replica and matching that with a snapshot of the reference replica. A similar technique is used to verify replicas of global tables. Over the years, we have learned that continuous verification of data at rest is the most reliable method of protecting against hardware failures, silent data corruption, and even software bugs.

Availability

DynamoDB regularly tests its resilience to node, rack, and availability zone (AZ) failures. For example, to test the availability and durability of the overall service, DynamoDB performs power-off tests. Using realistic simulated traffic, a job scheduler powers off random nodes. At the end of all the power-off tests, the test tools verify that the data stored in the database is logically valid and not corrupted.

Related content
Amazon Athena reduces query execution time by 14% by eliminating redundant operations.

The first point about availability is that it needs to be measurable. DynamoDB is designed for 99.999% availability for global tables and 99.99% availability for regional tables. To ensure that these goals are being met, DynamoDB continuously monitors availability at the service and table levels. The tracked availability data is used to estimate customer-perceived availability trends and trigger alarms if the number of errors that customers see crosses a certain threshold.

These alarms are called customer-facing alarms (CFAs). The goal of these alarms is to report any availability-related problems and proactively mitigate them either automatically or through operator intervention. The key point to note here is that availability is measured not only on the server side but on the client side.

We also use two sets of clients to measure the user-perceived availability. The first set of clients is internal Amazon services using DynamoDB as the data store. These services share the availability metrics for DynamoDB API calls as observed by their software.

The second set of clients is our DynamoDB canary applications. These applications are run from every AZ in the region, and they talk to DynamoDB through every public endpoint. Real application traffic allows us to reason about DynamoDB availability and latencies as seen by our customers. The canary applications offer a good representation of what our customers might be experiencing both long and short term.

The second point is that read and write availability need to be handled differently. A partition’s write availability depends on the health of its leader and of its write quorum, meaning two out of the three replicas from different AZs. A partition remains available as long as there are enough healthy replicas for a write quorum and a leader.

Related content
“Anytime query” approach adapts to the available resources.

In a large service, hardware failures such as memory and disk failures are common. When a node fails, all replication groups hosted on the node are down to two copies. The process of healing a storage replica can take several minutes because the repair process involves copying the B-tree — a data structure that maps partitions to storage locations — and write-ahead logs.

Upon detecting an unhealthy storage replica, the leader of a replication group adds a log replica to ensure there is no impact on durability. Adding a log replica takes only a few seconds, because the system has to copy only the most recent write-ahead logs from a healthy replica; reconstructing the more memory-intensive B-tree can wait. Quick healing of affected replication groups using log replicas thus ensures the high durability of the most recent writes. Adding a log replica is the fastest way to ensure that the write quorum of the group is always met. This minimizes disruption to write availability due to an unhealthy write quorum. The leader replica serves consistent reads.

Introducing log replicas was a big change to the system, but the Paxos consensus protocol, which is formally provable, gave us the confidence to safely tweak and experiment with the system to achieve higher availability. We have been able to run millions of Paxos groups in a region with log replicas. Eventually, consistent reads can be served by any of the replicas. In case a leader fails, other replicas detect its failure and elect a new leader to minimize disruptions to the availability of consistent reads.

Research areas

Related content

US, CA, Pasadena
The Amazon Web Services (AWS) Center for Quantum Computing (CQC) is a multi-disciplinary team of scientists, engineers, and technicians on a mission to develop a fault-tolerant quantum computer. You will be joining a team located in Pasadena, CA that conducts materials research to improve the performance of superconducting quantum processors. We seek a Quantum Research Scientist to investigate how material defects affect qubit performance. In this role, you will combine expertise in numerical simulations and materials characterization to study materials loss mechanisms such as two-level systems, quasiparticles, vortices, etc. Key job responsibilities Provide subject matter expertise on integrated experimental and computational studies of materials defects Develop and use computational tools for large-scale simulations of disordered structures Develop and implement multi-technique materials characterization workflows for thin films and devices, with a focus on the surfaces and interfaces Identify material properties that can be a reliable proxy for the performance of superconducting resonators and qubits Communicate findings to teammates, the broader CQC team and, when appropriate, publish findings in scientific journals A day in the life At the AWS CQC, we understand that developing quantum computing technology is a marathon, not a sprint. The work/life integration within our team encourages a culture where employees work hard and also have ownership over their downtime. We are committed to the growth and development of every employee at the AWS CQC, and that includes our research scientists. You will receive management and mentorship from within the team that is geared toward career growth, and also have the opportunity to participate in Amazon's mentorship programs for scientists and engineers. Working closely with other quantum research scientists in other disciplines – like design, measurement and cryogenic hardware – will provide opportunities to dive deep into an education on quantum computing. About the team Our team contributes to the fabrication of processors and other hardware that enable quantum computing technologies. Doing that necessitates the development of materials with tailored properties for superconducting circuits. Research Scientists and Engineers on the Materials team operate deposition and characterization systems in order to develop and optimize thin film processes for use in these devices. They work alongside other Research Scientists and Engineers to help deliver the fabricated devices for quantum computing experiments. Export Control Requirement: Due to applicable export control laws and regulations, candidates must be either a U.S. citizen or national, U.S. permanent resident (i.e., current Green Card holder), or lawfully admitted into the U.S. as a refugee or granted asylum, or be able to obtain a U.S export license. If you are unsure if you meet these requirements, please apply and Amazon will review your application for eligibility. About the team Diverse Experiences AWS values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture AWS values curiosity and connection. Our employee-led and company-sponsored affinity groups promote inclusion and empower our people to take pride in what makes us unique. Our inclusion events foster stronger, more collaborative teams. Our continual innovation is fueled by the bold ideas, fresh perspectives, and passionate voices our teams bring to everything we do. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve. Export Control Requirement: Due to applicable export control laws and regulations, candidates must be either a U.S. citizen or national, U.S. permanent resident (i.e., current Green Card holder), or lawfully admitted into the U.S. as a refugee or granted asylum, or be able to obtain a U.S export license. If you are unsure if you meet these requirements, please apply and Amazon will review your application for eligibility.
US, CA, Cupertino
We are seeking a highly skilled Data Scientist to join our Machine Learning Architecture team, focusing on power and performance optimization for ML acceleration workloads across Amazon's global data center infrastructure. This role combines advanced data science techniques with deep technical understanding of ML hardware acceleration to drive efficiency improvements in training and inference workloads at massive scale. Key job responsibilities ata Analysis & Optimization * Analyze power consumption and performance metrics across all Amazon data centers for machine learning acceleration workloads * Develop predictive models and statistical frameworks to identify optimization opportunities and performance bottlenecks * Create automated monitoring and alerting systems for power and performance anomalies Strategic Planning & Deployment Guidance * Provide data-driven recommendations for server deployments and capacity planning decisions across Amazon's global data center network * Develop optimization scenarios and business cases to improve capacity delivery efficiency to customers worldwide * Support strategic decision-making through comprehensive analysis of power, performance, and cost trade-offs Cross-Functional Collaboration * Partner with software engineering teams to optimize ML frameworks, drivers, and runtime systems * Collaborate with hardware engineering teams to influence chip design, server architecture, and cooling system optimization * Work closely with data center operations teams to implement and validate optimization strategies Research & Development * Conduct applied research on emerging ML acceleration technologies and their power/performance characteristics * Develop novel methodologies for measuring and improving energy efficiency in large-scale ML workloads * Publish findings and contribute to industry best practices in sustainable ML infrastructure
IN, KA, Bengaluru
Amazon Devices is an inventive research and development company that designs and engineer high-profile devices like the Kindle family of products, Fire Tablets, Fire TV, Health Wellness, Amazon Echo & Astro products. This is an exciting opportunity to join Amazon in developing state-of-the-art techniques that bring Gen AI on edge for our consumer products. We are looking for exceptional scientists to join our Applied Science team and help develop the next generation of edge models, and optimize them while doing co-designed with custom ML HW based on a revolutionary architecture. Work hard. Have Fun. Make History. Key job responsibilities What will you do? - Quantize, prune, distill, finetune Gen AI models to optimize for edge platforms - Fundamentally understand Amazon’s underlying Neural Edge Engine to invent optimization techniques - Analyze deep learning workloads and provide guidance to map them to Amazon’s Neural Edge Engine - Use first principles of Information Theory, Scientific Computing, Deep Learning Theory, Non Equilibrium Thermodynamics - Train custom Gen AI models that beat SOTA and paves path for developing production models - Collaborate closely with compiler engineers, fellow Applied Scientists, Hardware Architects and product teams to build the best ML-centric solutions for our devices - Publish in open source and present on Amazon's behalf at key ML conferences - NeurIPS, ICLR, MLSys.
IN, KA, Bengaluru
RBS (Retail Business Services) Tech team works towards enhancing the customer experience (CX) and their trust in product data by providing technologies to find and fix Amazon CX defects at scale. Our platforms help in improving the CX in all phases of customer journey, including selection, discoverability & fulfilment, buying experience and post-buying experience (product quality and customer returns). The team also develops GenAI platforms for automation of Amazon Stores Operations. As a Sciences team in RBS Tech, we focus on foundational ML research and develop scalable state-of-the-art ML solutions to solve the problems covering customer experience (CX) and Selling partner experience (SPX). We work to solve problems related to multi-modal understanding (text and images), task automation through multi-modal LLM Agents, supervised and unsupervised techniques, multi-task learning, multi-label classification, aspect and topic extraction for Customer Anecdote Mining, image and text similarity and retrieval using NLP and Computer Vision for product groupings and identifying duplicate listings in product search results. Key job responsibilities As an Applied Scientist, you will be responsible to design and deploy scalable GenAI, NLP and Computer Vision solutions that will impact the content visible to millions of customer and solve key customer experience issues. You will develop novel LLM, deep learning and statistical techniques for task automation, text processing, image processing, pattern recognition, and anomaly detection problems. You will define the research and experiments strategy with an iterative execution approach to develop AI/ML models and progressively improve the results over time. You will partner with business and engineering teams to identify and solve large and significantly complex problems that require scientific innovation. You will independently file for patents and/or publish research work where opportunities arise. The RBS org deals with problems that are directly related to the selling partners and end customers and the ML team drives resolution to organization level problems. Therefore, the Applied Scientist role will impact the large product strategy, identifies new business opportunities and provides strategic direction which is very exciting.
IN, KA, Bengaluru
RBS (Retail Business Services) Tech team works towards enhancing the customer experience (CX) and their trust in product data by providing technologies to find and fix Amazon CX defects at scale. Our platforms help in improving the CX in all phases of customer journey, including selection, discoverability & fulfilment, buying experience and post-buying experience (product quality and customer returns). The team also develops GenAI platforms for automation of Amazon Stores Operations. As a Sciences team in RBS Tech, we focus on foundational ML research and develop scalable state-of-the-art ML solutions to solve the problems covering customer experience (CX) and Selling partner experience (SPX). We work to solve problems related to multi-modal understanding (text and images), task automation through multi-modal LLM Agents, supervised and unsupervised techniques, multi-task learning, multi-label classification, aspect and topic extraction for Customer Anecdote Mining, image and text similarity and retrieval using NLP and Computer Vision for product groupings and identifying duplicate listings in product search results. Key job responsibilities As an Applied Scientist, you will be responsible to design and deploy scalable GenAI, NLP and Computer Vision solutions that will impact the content visible to millions of customer and solve key customer experience issues. You will develop novel LLM, deep learning and statistical techniques for task automation, text processing, image processing, pattern recognition, and anomaly detection problems. You will define the research and experiments strategy with an iterative execution approach to develop AI/ML models and progressively improve the results over time. You will partner with business and engineering teams to identify and solve large and significantly complex problems that require scientific innovation. You will help the team leverage your expertise, by coaching and mentoring. You will contribute to the professional development of colleagues, improving their technical knowledge and the engineering practices. You will independently as well as guide team to file for patents and/or publish research work where opportunities arise. The RBS org deals with problems that are directly related to the selling partners and end customers and the ML team drives resolution to organization level problems. Therefore, the Applied Scientist role will impact the large product strategy, identifies new business opportunities and provides strategic direction which is very exciting.
US, WA, Seattle
About Sponsored Products and Brands The Sponsored Products and Brands (SPB) team at Amazon Ads is re-imagining the advertising landscape through state-of-art generative AI technologies, revolutionizing how millions of customers discover products and engage with brands across Amazon.com and beyond. We are at the forefront of re-inventing advertising experiences, bridging human creativity with artificial intelligence to transform every aspect of the advertising lifecycle from ad creation and optimization to performance analysis and customer insights. We are a passionate group of innovators dedicated to developing responsible and intelligent AI technologies that balance the needs of advertisers, enhance the shopping experience, and strengthen the marketplace. If you're energized by solving complex challenges and pushing the boundaries of what's possible with AI, join us in shaping the future of advertising. Key job responsibilities This role will be pivotal in redesigning how ads contribute to a personalized, relevant, and inspirational shopping experience, with the customer value proposition at the forefront. Key responsibilities include, but are not limited to: * Contribute to the design and development of GenAI, deep learning, multi-objective optimization and/or reinforcement learning empowered solutions to transform ad retrieval, auctions, whole-page relevance, and/or bespoke shopping experiences. * Collaborate cross-functionally with other scientists, engineers, and product managers to bring scalable, production-ready science solutions to life. * Stay abreast of industry trends in GenAI, LLMs, and related disciplines, bringing fresh and innovative concepts, ideas, and prototypes to the organization. * Contribute to the enhancement of team’s scientific and technical rigor by identifying and implementing best-in-class algorithms, methodologies, and infrastructure that enable rapid experimentation and scaling. * Mentor and grow junior scientists and engineers, cultivating a high-performing, collaborative, and intellectually curious team. A day in the life As an Applied Scientist on the Sponsored Products and Brands Off-Search team, you will contribute to the development in Generative AI (GenAI) and Large Language Models (LLMs) to revolutionize our advertising flow, backend optimization, and frontend shopping experiences. This is a rare opportunity to redefine how ads are retrieved, allocated, and/or experienced—elevating them into personalized, contextually aware, and inspiring components of the customer journey. You will have the opportunity to fundamentally transform areas such as ad retrieval, ad allocation, whole-page relevance, and differentiated recommendations through the lens of GenAI. By building novel generative models grounded in both Amazon’s rich data and the world’s collective knowledge, your work will shape how customers engage with ads, discover products, and make purchasing decisions. If you are passionate about applying frontier AI to real-world problems with massive scale and impact, this is your opportunity to define the next chapter of advertising science. About the team The Off-Search team within Sponsored Products and Brands (SPB) is focused on building delightful ad experiences across various surfaces beyond Search on Amazon—such as product detail pages, the homepage, and store-in-store pages—to drive monetization. Our vision is to deliver highly personalized, context-aware advertising that adapts to individual shopper preferences, scales across diverse page types, remains relevant to seasonal and event-driven moments, and integrates seamlessly with organic recommendations such as new arrivals, basket-building content, and fast-delivery options. To execute this vision, we work in close partnership with Amazon Stores stakeholders to lead the expansion and growth of advertising across Amazon-owned and -operated pages beyond Search. We operate full stack—from backend ads-retail edge services, ads retrieval, and ad auctions to shopper-facing experiences—all designed to deliver meaningful value.
US, WA, Seattle
Passionate about books? The Amazon Books personalization team is looking for a talented Applied Scientist II to help develop and implement innovative science solutions to make it easier for millions of customers to find the next book they will love. In this role you will: - Collaborate within a dynamic team of scientists, economists, engineers, analysts, and business partners. - Utilize Amazon's large-scale computing and data resources to analyze customer behavior and product relationships. - Contribute to building and maintaining recommendation models, and assist in running A/B tests on the retail website. - Help develop and implement solutions to improve Amazon's recommendation systems. Key job responsibilities The role involves working with recommender systems that combine Natural Language Processing (NLP), Reinforcement Learning (RL), graph networks, and deep learning to help customers discover their next great read. You will assist in developing recommendation model pipelines, analyze deep learning-based recommendation models, and collaborate with engineering and product teams to improve customer-facing recommendations. As part of the team, you will learn and contribute across these technical areas while developing your skills in the recommendation systems space. A day in the life In your day-to-day role, you will contribute to the development and maintenance of recommendation models, support the implementation of A/B test experiments, and work alongside engineers, product teams, and other scientists to help deploy machine learning solutions to production. You will gain hands-on experience with our recommendation systems while working under the guidance of senior scientists. About the team We are Books Personalization a collaborative group of 5-7 scientists, 2 product leaders, and 2 engineering teams that aims to help find the right next read for customers through high quality personalized book recommendation experiences. Books Personalization is a part of the Books Content Demand organization, which focuses on surfacing the best books for customers wherever they are in their current book journey.
GB, London
Are you a MS student interested in a 2026 internship in the field of machine learning, deep learning, generative AI, large language models and speech technology, robotics, computer vision, optimization, operations research, quantum computing, automated reasoning, or formal methods? If so, we want to hear from you! We are looking for a customer obsessed Data Scientist Intern who can innovate in a business environment, building and deploying machine learning models to drive step-change innovation and scale it to the EU/worldwide. If this describes you, come and join our Data Science teams at Amazon for an exciting internship opportunity. If you are insatiably curious and always want to learn more, then you’ve come to the right place. You can find more information about the Amazon Science community as well as our interview process via the links below; https://www.amazon.science/ https://amazon.jobs/content/en/career-programs/university/science Key job responsibilities As a Data Science Intern, you will have following key job responsibilities: • Work closely with scientists and engineers to architect and develop new algorithms to implement scientific solutions for Amazon problems. • Work on an interdisciplinary team on customer-obsessed research • Experience Amazon's customer-focused culture • Create and Deliver Machine Learning projects that can be quickly applied starting locally and scaled to EU/worldwide • Build and deploy Machine Learning models using large data-sets and cloud technology. • Create and share with audiences of varying levels technical papers and presentations • Define metrics and design algorithms to estimate customer satisfaction and engagement A day in the life At Amazon, you will grow into the high impact person you know you’re ready to be. Every day will be filled with developing new skills and achieving personal growth. How often can you say that your work changes the world? At Amazon, you’ll say it often. Join us and define tomorrow. Some more benefits of an Amazon Science internship include; • All of our internships offer a competitive stipend/salary • Interns are paired with an experienced manager and mentor(s) • Interns receive invitations to different events such as intern program initiatives or site events • Interns can build their professional and personal network with other Amazon Scientists • Interns can potentially publish work at top tier conferences each year About the team Applicants will be reviewed on a rolling basis and are assigned to teams aligned with their research interests and experience prior to interviews. Start dates are available throughout the year and durations can vary in length from 3-6 months for full time internships. This role may available across multiple locations in the EMEA region (Austria, France, Germany, Ireland, Israel, Italy, Luxembourg, Netherlands, Poland, Romania, Spain and the UK). Please note these are not remote internships.
IL, Tel Aviv
Are you a MS or PhD student interested in a 2026 internship in the field of machine learning, deep learning, generative AI, large language models, speech technology, robotics, computer vision, optimization, operations research, quantum computing, automated reasoning, or formal methods? If so, we want to hear from you! We are looking for students interested in using a variety of domain expertise to invent, design and implement state-of-the-art solutions for never-before-solved problems. You can find more information about the Amazon Science community as well as our interview process via the links below; https://www.amazon.science/ https://amazon.jobs/content/en/career-programs/university/science https://amazon.jobs/content/en/how-we-hire/university-roles/applied-science Key job responsibilities As an Applied Science Intern, you will own the design and development of end-to-end systems. You’ll have the opportunity to write technical white papers, create roadmaps and drive production level projects that will support Amazon Science. You will work closely with Amazon scientists and other science interns to develop solutions and deploy them into production. You will have the opportunity to design new algorithms, models, or other technical solutions whilst experiencing Amazon’s customer focused culture. The ideal intern must have the ability to work with diverse groups of people and cross-functional teams to solve complex business problems. A day in the life At Amazon, you will grow into the high impact person you know you’re ready to be. Every day will be filled with developing new skills and achieving personal growth. How often can you say that your work changes the world? At Amazon, you’ll say it often. Join us and define tomorrow. Some more benefits of an Amazon Science internship include; • All of our internships offer a competitive stipend/salary • Interns are paired with an experienced manager and mentor(s) • Interns receive invitations to different events such as intern program initiatives or site events • Interns can build their professional and personal network with other Amazon Scientists • Interns can potentially publish work at top tier conferences each year About the team Applicants will be reviewed on a rolling basis and are assigned to teams aligned with their research interests and experience prior to interviews. Start dates are available throughout the year and durations can vary in length from 3-6 months for full time internships. This role may available across multiple locations in the EMEA region (Austria, Estonia, France, Germany, Ireland, Israel, Italy, Jordan, Luxembourg, Netherlands, Poland, Romania, South Africa, Spain, Sweden, UAE, and UK). Please note these are not remote internships.
GB, London
Are you a MS or PhD student interested in a 2026 internship in the field of machine learning, deep learning, generative AI, large language models and speech technology, robotics, computer vision, optimization, operations research, quantum computing, automated reasoning, or formal methods? If so, we want to hear from you! We are looking for students interested in using a variety of domain expertise to invent, design and implement state-of-the-art solutions for never-before-solved problems. You can find more information about the Amazon Science community as well as our interview process via the links below; https://www.amazon.science/ https://amazon.jobs/content/en/career-programs/university/science https://amazon.jobs/content/en/how-we-hire/university-roles/applied-science Key job responsibilities As an Applied Science Intern, you will own the design and development of end-to-end systems. You’ll have the opportunity to write technical white papers, create roadmaps and drive production level projects that will support Amazon Science. You will work closely with Amazon scientists and other science interns to develop solutions and deploy them into production. You will have the opportunity to design new algorithms, models, or other technical solutions whilst experiencing Amazon’s customer focused culture. The ideal intern must have the ability to work with diverse groups of people and cross-functional teams to solve complex business problems. A day in the life At Amazon, you will grow into the high impact person you know you’re ready to be. Every day will be filled with developing new skills and achieving personal growth. How often can you say that your work changes the world? At Amazon, you’ll say it often. Join us and define tomorrow. Some more benefits of an Amazon Science internship include; • All of our internships offer a competitive stipend/salary • Interns are paired with an experienced manager and mentor(s) • Interns receive invitations to different events such as intern program initiatives or site events • Interns can build their professional and personal network with other Amazon Scientists • Interns can potentially publish work at top tier conferences each year About the team Applicants will be reviewed on a rolling basis and are assigned to teams aligned with their research interests and experience prior to interviews. Start dates are available throughout the year and durations can vary in length from 3-6 months for full time internships. This role may available across multiple locations in the EMEA region (Austria, Estonia, France, Germany, Ireland, Israel, Italy, Jordan, Luxembourg, Netherlands, Poland, Romania, Spain, South Africa, UAE, and UK). Please note these are not remote internships.