Formally verified AES-XTS: The first AES algorithm to join s2n-bignum

Simplifying and clarifying the assembly code for core operations enabled automated optimization and verification.

Cryptographic encryption algorithms are mathematical procedures that transform readable data into ciphertext that looks like a stream of random bits. The ciphertext can be decrypted only with the corresponding decryption algorithm and the correct key.

For data at rest — information stored on disks or in databases — algorithms like AES-XTS encrypt each block before it’s written to storage, protecting against physical theft or unauthorized access to storage systems. For data in transit — information traveling across networks — protocols like TLS combine multiple algorithms: asymmetric encryption algorithms (RSA or elliptic curves) establish secure connections, while fast symmetric encryption algorithms (like AES-GCM) protect the actual data stream and verify that it hasn't been tampered with. At Amazon Web Services (AWS), we use AES-XTS to protect customer data in services like EBS, Nitro cards, and DynamoDB, while TLS with AES-GCM secures all network communication between services and to customers.

We took on the challenge of formally verifying an optimized Arm64 assembly implementation of AES-XTS decryption, where “formal verification” is the process of proving mathematically that an engineered system meets a particular specification.

Our work follows the IEEE Standard 1619 for cryptographic protection of block-oriented storage devices and focuses on the AES-256-XTS variant of AES-XTS. The “256” specifies the size of the encryption key.

Unlike algorithms that process fixed-size blocks, AES-XTS handles variable-length data from 16 bytes up to 16 megabytes, with special logic for incomplete blocks. The assembly code verified was a 5x-unrolled version, meaning that its loops were executed in parallel across five registers (each containing an input block), and it had been optimized for modern CPU pipelines. It was complex enough that manual review couldn't guarantee correctness yet critical enough that errors could compromise customer data security.

As part of Amazon Web Services’ s2n-bignum library of formally verified big-number operations, we contributed an improved Arm64 assembly implementation of AES-XTS encryption and decryption, as well as specification and formal verification using the HOL Light interactive theorem prover, which was developed by a member of our team (John Harrison).

This was an experiment in the proof-driven development of a large function with multiple paths based on the input length. It resulted in the largest proof so far in the s2n-bignum library. For the typical input size of 512 bytes, the performance of the algorithm either stayed close to that of original code or improved slightly on highly optimized Arm cores. By adding this algorithm and its proof to the s2n-bignum library, we pave the way for more AES-based algorithms to be added.

Description of the algorithm

AES is a block cipher that implements a keyed permutation. This means that it processes the plaintext files in blocks (in this case, blocks of 128 bits), and for any given key, it defines a bijective (one-to-one and invertible) function mapping each plaintext block to a unique ciphertext block. This mathematical property ensures that decryption can uniquely recover the original plaintext.

AES-XTS is the mode specifically designed for storage encryption. It uses AES as its underlying building block but adds position-dependent tweaks and ciphertext stealing — a method for handling partial blocks — to address the unique requirements of disk encryption, where you need random access to any sector and must preserve the exact data size.

AES-XTS encrypts storage data using a two-key approach where each 128-bit block and its position-dependent tweak are subjected to an exclusive-OR operation (XOR), a binary operation that outputs a one only if the input values differ. The result of the operation is encrypted with AES, then XORed with the tweak again, ensuring that identical data at different disk locations produces different ciphertext. The tweak is generated by encrypting the sector number with a second key, then multiplying by powers of α in a Galois field, creating unique values for each block position.

When the final block isn't a full 128 bits, ciphertext stealing kicks in. Ciphertext stealing borrows bytes from the previous block, allowing encryption of data of any length without padding or wasted space. This lets you read or write any sector independently — critical for disk encryption — while basing each block's encryption on its position. That is a desired feature since the security model of disk encryption allows the adversary to access sectors other than those in question, modify them, and request their decryption. It also ensures that the size of the ciphertext is exactly the same as that of the plaintext, so it fits in its place.

The basic XEX block encryption structure with the XOR-Encrypt-XOR flow v3.jpg
AES-XTS encryption employing an XOR-encrypt-XOR (XEX) structure.
Ciphertext-stealing-encryption.png
Ciphertext stealing handles partial blocks during encryption by splitting the second-to-last block's output.
Decryption.png
The symmetric decryption process.
Reverse ciphertext stealing for decryption.png
Reverse ciphertext stealing for decryption.

Control flow of the assembly implementation

We started from an existing implementation of AES-XTS in Amazon’s AWS-LC cryptographic library. AES-XTS loops through the plaintext in 128-bit blocks, and encryption of each block requires 15 steps, each with its own “round key” derived from the encryption key. The existing implementation is 5x unrolled, meaning it processes blocks in parallel, five at a time. If the final block is less than 128 bits in length, there’s a risk of “buffer overread”, or reading beyond the limits of the input buffer.

To avoid overread, the existing implementation does complex manipulation over the pointer to the current location in the input buffer. This requires a sophisticated control flow that can be hard to follow: the loop counter is incremented and decremented multiple times before and during the loop, and the loop has two additional exit points other than the final loop-back branch.

One exit point is for the case when four blocks remain during the final iteration of the loop; the other exit point is for the case of one to three blocks remaining. The flow of the loop interleaves the load/store instructions with the AES and XOR instructions in an effort to maximize pipeline usage. After the loop exit, the processing of the remaining blocks is intertwined for the lengths of four down to one; if there’s a partial block at the end, the algorithm performs the ciphertext-stealing procedure. Additionally, only seven of the 15 rounds’ keys were kept in registers; the other eight were repeatedly loaded from memory in the loop and outside it.

We first investigated whether we could improve the performance of the main loop by letting SLOTHY, a superoptimizer for Arm code, rearrange the instructions to maximize pipeline usage. SLOTHY contains simplified models of various Arm microarchitectures. It uses a constraint solver to provide optimal instruction schedule, register renaming, and periodic loop interleaving.

However, for SLOTHY to identify and optimize a loop, the loop has to exhibit typical loop behavior, decreasing the counter at the end of each iteration and then jumping back to the beginning. SLOTHY also cannot handle the nested loop created by loading the eight-round keys.

This gave us a reason to start “cleaning up” the loop. First, we freed up registers to permanently hold all round keys; this was possible as the optimized order of instructions required fewer temporary registers than the original code. Second, we removed the multiple exit points and the manipulation of the loop counter to handle the remaining blocks. The value of the counter always indicates the number of five-block chunks remaining in the buffer, conforming to SLOTHY’s requirement; the loop ends before the handling of the remaining blocks.

Once the loop ends, we have a separate processing branch to handle each possible number of remaining blocks, from one to four; all four branches end in ciphertext stealing. This can be seen in the control flow graphs of the encrypt and decrypt algorithms (see below). Throughout the code, we maintained the constant-time design mindset; that is, branching and special processing are based not on secret data but only on public values, the input byte lengths.

AES-256-XTS encryption control flow graph v2.jpg
AES-256-XTS encryption control flow graph.
AES-256-XTS decryption control flow graph v2.jpg
AES-256-XTS decryption control flow graph.

Performance

With our modifications to the code, we were able to use SLOTHY to optimize the encrypt algorithm. This resulted in slight performance gains on the AWS Graviton family of Arm processors, although the gains were smaller on the more advanced chips, which have an optimized out-of-order pipeline. Compared to the original implementation, keeping round keys in registers throughout the algorithm’s execution, to save on loading them from memory, allowed us to offset the effects of not interleaving the AES instructions with other ones.

Having a cleaner flow of instructions in the loop and modular exit processing allowed us to experiment with various unrolling factors for the loop iterations. We experimented with 3x, 4x, and 6x factors and concluded that 5x is still the best choice across various microarchitectures.

Ensuring correctness through formal verification

To deploy optimized cryptographic code in production, we need mathematical certainty that it works correctly. While random testing quickly checks simple cases, we rely on formal verification to deliver the highest level of assurance for our AES-XTS implementation.

Why HOL Light for AES-XTS?

To prove that our implementation matches the IEEE 1619 specification, we use HOL Light, an interactive theorem prover developed by our colleague John Harrison. HOL Light is a particularly simple implementation of the "correct by construction" approach to software development, in which code is verified as it’s written. HOL Light’s trusted kernel is just a few hundred lines of code, which implements basic logical inference rules. This means that even if there's a bug in our proof tactics or automation, it cannot cause HOL Light to accept an incorrect proof. At worst, a bug prevents us from completing a proof, but it cannot make a false statement provable.

We chose HOL Light for several reasons specific to AES-XTS verification:

  • Assembly-level verification: We write our implementations directly in assembly rather than relying on compiled code. While more challenging, this makes our proofs independent of any compiler. HOL Light reasons directly about machine code bytes using CPU instruction specifications, providing assurance at the lowest level of the software stack.
  • Existing cryptographic infrastructure: S2n-bignum already provides extensive support for cryptographic verification, including symbolic simulation that strips away execution artifacts and leaves purely mathematical problems, specialized tactics for word operations, and byte list handling. We add proven lemmas about AES operations that we can reuse for the proofs of other AES modes.
  • Complex control flow handling: Unlike fully automated provers that might fail on complex proofs without enough explanation, HOL Light's interactive approach lets us guide proofs through the intricate invariants required for our 5x-unrolled loops, processing arbitrarily long blocks of data and performing the complex memory reasoning required by variable-length inputs and partial blocks.

The s2n-bignum framework

Using s2n-bignum to implement AES-XTS serves two purposes: it's both a framework for formally verifying assembly code in x86-64 and Arm architectures and a collection of fast, verified assembly functions for cryptography. The library already contains verified implementations of numerous cryptographic algorithms, especially those pertaining to big-number mathematical operations (hence the name), which are the foundation of public-key cryptographic primitives. For details on how HOL Light was used to prove public-key algorithms as part of s2n-bignum, please refer to the previous Amazon Science blog posts “ Formal verification makes RSA faster — and faster to deploy” and “Better-performing ‘25519’ elliptic-curve cryptography”.

As we mentioned, AES-XTS is one of the modes of the AES block cipher. AES is based on a substitution-permutation network (SPN) structure, which combines substitution operations (SubBytes using the S-box), permutation operations (ShiftRows, MixColumns), and key mixing. By expanding s2n-bignum to include the AES instruction set architecture (ISA) found in Arm64 and x86_64 processors, specifications for the AES block cipher, and additional specifications for AES-XTS, we're paving the way for the same rigorous verification of more AES-based algorithms.

Developing and testing the specification

The SPN nature of AES and the modes that are based on it cannot be expressed using simple mathematical formulae — such as modular multiplication, which is fundamental to public-key cryptography — that can be innately understood by a theorem prover. They require writing descriptions of the steps for processing the data. This is why, before verifying the assembly, we needed confidence that our HOL Light specification accurately captured the IEEE standard.

We wrote the specification to mirror the standard's structure, using byte lists for input/output and 128-bit words for internal block operations. Then we developed conversions, HOL Light functions that we used to evaluate specifications with concrete inputs while generating proofs that the evaluations are mathematically correct.

We validated our specification by conducting unit tests that cover different AES-XTS encryption/decryption scenarios, exercising the processing of all blocks (using recursion) and ciphertext stealing.

These tests confirmed that our specification matched the IEEE standard before we tackled the more complex assembly verification. This two-phase approach — first ensuring that the specification is correct through testing, then formally verifying that the implementation matches the specification — gave us confidence we were proving the right thing.

The proof strategy

Our proofs are compositional, meaning they break the overall problem into subproblems that can be proved separately. Depending on the subproblem, the subproofs can be bounded — true only for a range of inputs — or unbounded.

For inputs with fewer than five (or six, in the case of decrypt) blocks, we wrote bounded proofs that exhaustively verify each case. For inputs with five (six, in the case of decrypt) or more blocks, we developed loop invariants — mathematical statements that remain true throughout loop execution — to prove correctness for arbitrarily long inputs. The loop invariants track three critical factors until the loop exit condition is met: register states at each iteration, the evolution of "tweaks" (which make each block's encryption unique), and memory contents as blocks are processed. For partial-block (tail) handling, we proved a separate theorem for ciphertext stealing that could be reused across all cases.

The top-level correctness theorem composes all proofs together, asserting the following statement:

If the program, inputs, output, and stack satisfy 
certain disjointness properties, and the input length len
satisfies 16 <= len <= 224, then given that the initial
machine state is set up with proper symbolic values, the value
stored in the output buffer must satisfy the AES-XTS 
specification after the whole program is executed.

Memory safety and constant-time proofs

Most recently, s2n-bignum was equipped with new functions and tactics for formally defining the constant-time and memory safety properties of assembly functions. With these resources, many assembly subroutines in s2n-bignum were verified to be constant time and memory safe, including top-level scalar-multiplication functions in elliptic curves, big-integer arithmetic for RSA, and the Arm implementation of the ML-KEM cryptography standard (the subject of a forthcoming blog post on Amazon Science). All assembly subroutines identified for use in AWS-LC as of October 2025 were formally verified to be constant time and memory safe.

We are exploring whether the new tactics can easily be used to verify assembly subroutines that have subsequently been added, such as AES-XTS. As we mentioned, AES-XTS has a remarkably complex control flow, which resulted in a long and involved functional-correctness proof. That complexity is also a challenge for safety proofs. The process is ongoing, but we have already proved safety properties for the ciphertext-stealing subroutines of the decryption and encryption algorithms.

These first proofs focused on crucial memory access procedures that are prone to buffer overread. Proofs for the remaining parts of the decryption and encryption algorithms can use the same methodology, where the constant-time and memory-safety proofs follow the same structure as the functional-correctness proofs but are simpler, since their proof goal is more focused.

Continuous assurance of correctness

We've integrated formal verification into s2n-bignum's continuous-integration (CI) workflow. This provides assurance that no changes to our AES-XTS implementation can be committed without successfully passing a formal proof of correctness. As part of CI, the CPU instruction modeling is validated through randomized testing against real hardware, "fuzzing out" inaccuracies to ensure our specifications are correct and the proofs hold in practice.

Furthermore, the proof guarantees correctness for all possible inputs, since they’re represented in the proof as symbols. This overcomes the typical shortcoming of coverage testing, which may cover all paths of the code but may not be able to cover all input values. For example, a constant-time code, like the one used here, is written without branching on secret values. Typically, then, secret values are incorporated into the operation through the use of masks derived from them. The same instructions are executed irrespective of the secret value. Hence, achieving line coverage is usually within the reach of a developer, but achieving value coverage is left to the formal verification of correctness.

This same methodology has enabled AWS to deploy optimized cryptographic implementations with mathematical guarantees of correctness while achieving significant performance improvements. This allows the developer and tools to further optimize the code freely without worrying about introducing bugs, since these will be automatically caught by the proof. Our experience with AES-XTS shows that proof-driven development of assembly code yields a control flow that is easier to understand, review, maintain, and optimize while never ceasing to be provably correct.

Research areas

Related content

US, WA, Redmond
Amazon Leo is an initiative to launch a constellation of Low Earth Orbit satellites that will provide low-latency, high-speed broadband connectivity to unserved and underserved communities around the world. As a Communications Engineer in Modeling and Simulation, this role is primarily responsible for the developing and analyzing high level system resource allocation techniques for links to ensure optimal system and network performance from the capacity, coverage, power consumption, and availability point of view. Be part of the team defining the overall communication system and architecture of Amazon Leo’s broadband wireless network. This is a unique opportunity to innovate and define novel wireless technology with few legacy constraints. The team develops and designs the communication system of Leo and analyzes its overall system level performance, such as overall throughput, latency, system availability, packet loss, etc., as well as compatibility for both connectivity and interference mitigation with other space and terrestrial systems. This role in particular will be responsible for 1) evaluating complex multi-disciplinary trades involving RF bandwidth and network resource allocation to customers, 2) understanding and designing around hardware/software capabilities and constraints to support a dynamic network topology, 3) developing heuristic or solver-based algorithms to continuously improve and efficiently use available resources, 4) demonstrating their viability through detailed modeling and simulation, 5) working with operational teams to ensure they are implemented. This role will be part of a team developing the necessary simulation tools, with particular emphasis on coverage, capacity, latency and availability, considering the yearly growth of the satellite constellation and terrestrial network. Export Control Requirement: Due to applicable export control laws and regulations, candidates must be a U.S. citizen or national, U.S. permanent resident (i.e., current Green Card holder), or lawfully admitted into the U.S. as a refugee or granted asylum. Key job responsibilities • Work within a project team and take the responsibility for the Leo's overall communication system design and architecture • Extend existing code/tools and create simulation models representative of the target system, primarily in MATLAB • Design interconnection strategies between fronthaul and backhaul nodes. Analyze link availability, investigate link outages, and optimize algorithms to study and maximize network performance • Use RF and optical link budgets with orbital constellation dynamics to model time-varying system capacity • Conduct trade-off analysis to benefit customer experience and optimization of resources (costs, power, spectrum), including optimization of satellite constellation design and link selection • Work closely with implementation teams to simulate expected system level performance and provide quick feedback on potential improvements • Analyze and minimize potential self-interference or interference with other communication systems • Provide visualizations, document results, and communicate them across multi-disciplinary project teams to make key architectural decisions
US, WA, Seattle
We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to apply their causal inference / structural econometrics skillsets to solve real world problems. The intern will work in the area of Store Economics and Science (SEAS) and develop models to SEAS. Our PhD Economist Internship Program offers hands-on experience in applied economics, supported by mentorship, structured feedback, and professional development. Interns work on real business and research problems, building skills that prepare them for full-time economist roles at Amazon and beyond. You will learn how to build data sets and perform applied econometric analysis collaborating with economists, scientists, and product managers. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. About the team The Stores Economics and Science Team (SEAS) is a Stores-wide interdisciplinary team at Amazon with a "peak jumping" mission focused on disruptive innovation. The team applies science, economics, and engineering expertise to tackle the business's most critical problems, working to move from local to global optima across Amazon Stores operations. SEAS builds partnerships with organizations throughout Amazon Stores to pursue this mission, exploring frontier science while learning from the experience and perspective of others. Their approach involves testing solutions first at a small scale, then aligning more broadly to build scalable solutions that can be implemented across the organization. The team works backwards from customers using their unique scientific expertise to add value, takes on long-run and high-risk projects that business teams typically wouldn't pursue, helps teams with kickstart problems by building practical prototypes, raises the scientific bar at Amazon, and builds and shares software that makes Amazon more productive.
US, TX, Austin
Amazon Security is seeking an Applied Scientist to work on GenAI acceleration within the Secure Third Party Tools (S3T) organization. The S3T team has bold ambitions to re-imagine security products that serve Amazon's pace of innovation at our global scale. This role will focus on leveraging large language models and agentic AI to transform third-party security risk management, automate complex vendor assessments, streamline controllership processes, and dramatically reduce assessment cycle times. You will drive builder efficiency and deliver bar-raising security engagements across Amazon. Key job responsibilities Own and drive end-to-end technical delivery for scoped science initiatives focused on third-party security risk management, independently defining research agendas, success metrics, and multi-quarter roadmaps with minimal oversight. Understanding approaches to automate third-party security review processes using state-of-the-art large language models, development intelligent systems for vendor assessment document analysis, security questionnaire automation, risk signal extraction, and compliance decision support. Build advanced GenAI and agentic frameworks including multi-agent orchestration, RAG pipelines, and autonomous workflows purpose-built for third-party risk evaluation, security documentation processing, and scalable vendor assessment at enterprise scale. Build ML-powered risk intelligence capabilities that enhance third-party threat detection, vulnerability classification, and continuous monitoring throughout the vendor lifecycle. Coordinate with Software Engineering and Data Engineering to deploy production-grade ML solutions that integrate seamlessly with existing third-party risk management workflows and scale across the organization. About the team Security is central to maintaining customer trust and delivering delightful customer experiences. At Amazon, our Security organization is designed to drive bar-raising security engagements. Our vision is that Builders raise the Amazon security bar when they use our recommended tools and processes, with no overhead to their business. Diverse Experiences Amazon Security values diverse experiences. Even if you do not meet all of the qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Why Amazon Security? At Amazon, security is central to maintaining customer trust and delivering delightful customer experiences. Our organization is responsible for creating and maintaining a high bar for security across all of Amazon’s products and services. We offer talented security professionals the chance to accelerate their careers with opportunities to build experience in a wide variety of areas including cloud, devices, retail, entertainment, healthcare, operations, and physical stores. Inclusive Team Culture In Amazon Security, it’s in our nature to learn and be curious. Ongoing DEI events and learning experiences inspire us to continue learning and to embrace our uniqueness. Addressing the toughest security challenges requires that we seek out and celebrate a diversity of ideas, perspectives, and voices. Training & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, training, and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why flexible work hours and arrangements are part of our culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve.
US, CA, Mountain View
At AWS Healthcare AI, we're revolutionizing healthcare delivery through AI solutions that serve millions globally. As a pioneer in healthcare technology, we're building next-generation services that combine Amazon's world-class AI infrastructure with deep healthcare expertise. Our mission is to accelerate our healthcare businesses by delivering intuitive and differentiated technology solutions that solve enduring business challenges. The AWS Healthcare AI organization includes services such as HealthScribe, Comprehend Medical, HealthLake, and more. We're seeking a Senior Applied Scientist to join our team working on our AI driven clinical solutions that are transforming how clinicians interact with patients and document care. Key job responsibilities To be successful in this mission, we are seeking an Applied Scientist to contribute to the research and development of new, highly influencial AI applications that re-imagine experiences for end-customers (e.g., consumers, patients), frontline workers (e.g., customer service agents, clinicians), and back-office staff (e.g., claims processing, medical coding). As a leading subject matter expert in NLU, deep learning, knowledge representation, foundation models, and reinforcement learning, you will collaborate with a team of scientists to invent novel, generative AI-powered experiences. This role involves defining research directions, developing new ML techniques, conducting rigorous experiments, and ensuring research translates to impactful products. You will be a hands-on technical innovator who is passionate about building scalable scientific solutions. You will set the standard for excellence, invent scalable, scientifically sound solutions across teams, define evaluation methods, and lead complex reviews. This role wields significant influence across AWS, Amazon, and the global research community.
US, MA, N.reading
Amazon Industrial Robotics Group is seeking exceptional talent to help develop the next generation of advanced robotics systems that will transform automation at Amazon's scale. We're building revolutionary robotic systems that combine cutting-edge AI, sophisticated control systems, and advanced mechanical design to create adaptable automation solutions capable of working safely alongside humans in dynamic environments. This is a unique opportunity to shape the future of robotics and automation at an unprecedented scale, working with world-class teams pushing the boundaries of what's possible in robotic dexterous manipulation, locomotion, and human-robot interaction. This role presents an opportunity to shape the future of robotics through innovative applications of deep learning and large language models. At Amazon Industrial Robotics Group, we leverage advanced robotics, machine learning, and artificial intelligence to solve complex operational challenges at an unprecedented scale. Our fleet of robots operates across hundreds of facilities worldwide, working in sophisticated coordination to fulfill our mission of customer excellence. We are pioneering the development of dexterous manipulation system that: - Enables unprecedented generalization across diverse tasks - Enables contact-rich manipulation in different environments - Seamlessly integrates low-level skills and high-level behaviors - Leverage mechanical intelligence, multi-modal sensor feedback and advanced control techniques. The ideal candidate will contribute to research that bridges the gap between theoretical advancement and practical implementation in robotics. You will be part of a team that's revolutionizing how robots learn, adapt, and interact with their environment. Join us in building the next generation of intelligent robotics systems that will transform the future of automation and human-robot collaboration. A day in the life - Work on design and implementation of methods for Visual SLAM, navigation and spatial reasoning - Leverage simulation and real-world data collection to create large datasets for model development - Develop a hierarchical system that combines low-level control with high-level planning - Collaborate effectively with multi-disciplinary teams to co-design hardware and algorithms for dexterous manipulation
US, WA, Seattle
Come be a part of a rapidly expanding $35 billion-dollar global business. At Amazon Business, a fast-growing startup passionate about building solutions, we set out every day to innovate and disrupt the status quo. We stand at the intersection of tech & retail in the B2B space developing innovative purchasing and procurement solutions to help businesses and organizations thrive. At Amazon Business, we strive to be the most recognized and preferred strategic partner for smart business buying. Bring your insight, imagination and a healthy disregard for the impossible. Join us in building and celebrating the value of Amazon Business to buyers and sellers of all sizes and industries. Unlock your career potential. Amazon Business Data Insights and Analytics team is looking for a Data Scientist to lead the research and thought leadership to drive our data and insights strategy for Amazon Business. This role is central in shaping the definition and execution of the long-term strategy for Amazon Business. You will be responsible for researching, experimenting and analyzing predictive and optimization models, designing and implementing advanced detection systems that analyze customer behavior at registration and throughout their journey. You will work on ambiguous and complex business and research science problems with large opportunities. You'll leverage diverse data signals including customer profiles, purchase patterns, and network associations to identify potential abuse and fraudulent activities. You are an analytical individual who is comfortable working with cross-functional teams and systems, working with state-of-the-art machine learning techniques and AWS services to build robust models that can effectively distinguish between legitimate business activities and suspicious behavior patterns You must be a self-starter and be able to learn on the go. Excellent written and verbal communication skills are required as you will work very closely with diverse teams. Key job responsibilities - Interact with business and software teams to understand their business requirements and operational processes - Frame business problems into scalable solutions - Adapt existing and invent new techniques for solutions - Gather data required for analysis and model building - Create and track accuracy and performance metrics - Prototype models by using high-level modeling languages such as R or in software languages such as Python. - Familiarity with transforming prototypes to production is preferred. - Create, enhance, and maintain technical documentation
US, NY, New York
We are seeking an Applied Scientist to lead the development of evaluation frameworks and data collection protocols for robotic capabilities. In this role, you will focus on designing how we measure, stress-test, and improve robot behavior across a wide range of real-world tasks. Your work will play a critical role in shaping how policies are validated and how high-quality datasets are generated to accelerate system performance. You will operate at the intersection of robotics, machine learning, and human-in-the-loop systems, building the infrastructure and methodologies that connect teleoperation, evaluation, and learning. This includes developing evaluation policies, defining task structures, and contributing to operator-facing interfaces that enable scalable and reliable data collection. The ideal candidate is highly experimental, systems-oriented, and comfortable working across software, robotics, and data pipelines, with a strong focus on turning ambiguous capability goals into measurable and actionable evaluation systems. Key job responsibilities - Design and implement evaluation frameworks to measure robot capabilities across structured tasks, edge cases, and real-world scenarios - Develop task definitions, success criteria, and benchmarking methodologies that enable consistent and reproducible evaluation of policies - Create and refine data collection protocols that generate high-quality, task-relevant datasets aligned with model development needs - Build and iterate on teleoperation workflows and operator interfaces to support efficient, reliable, and scalable data collection - Analyze evaluation results and collected data to identify performance gaps, failure modes, and opportunities for targeted data collection - Collaborate with engineering teams to integrate evaluation tooling, logging systems, and data pipelines into the broader robotics stack - Stay current with advances in robotics, evaluation methodologies, and human-in-the-loop learning to continuously improve internal approaches - Lead technical projects from conception through production deployment - Mentor junior scientists and engineers
US, NY, New York
The Sponsored Products and Brands team at Amazon Ads is re-imagining the advertising landscape through industry leading generative AI technologies, revolutionizing how millions of customers discover products and engage with brands across Amazon.com and beyond. We are at the forefront of re-inventing advertising experiences, bridging human creativity with artificial intelligence to transform every aspect of the advertising lifecycle from ad creation and optimization to performance analysis and customer insights. We are a passionate group of innovators dedicated to developing responsible and intelligent AI technologies that balance the needs of advertisers, enhance the shopping experience, and strengthen the marketplace. If you're energized by solving complex challenges and pushing the boundaries of what's possible with AI, join us in shaping the future of advertising. The Global Optimization (GO) team within Sponsored Products and Brands at Amazon Ads is re-imagining the advertising stack from the ground up across 20+ marketplaces. We are seeking an experienced Senior Data Scientist to join our team. You will develop scalable analytical approaches to evaluate marketplace performance across the entire Ads stack to uncover regional and marketplace-specific insights, design and run experiments, and shape our development roadmap. We operate as a closely integrated team of Data Scientists, Applied Scientists, and Engineers to translate data-driven insights into measurable business impact. If you're energized by solving complex challenges at international scale and pushing the boundaries of what's possible with GenAI, join us in shaping the future of global advertising at Amazon. Key job responsibilities As a Data Scientist on this team, you will: - Write code to obtain, manipulate, and analyze data to derive business insights. - Apply statistical and ML knowledge to specific business problems and data. - Analyze historical data to identify trends and support optimal decision making. - Formalize assumptions about how our systems are expected to work and develop methods to systematically identify high ROI improvements. About the team SPB Global Optimization (GO) team was created to accelerate growth in non-US markets. We are driving business growth across all marketplaces by creating delightful experiences for shoppers and advertisers alike. We are working backwards from customers to re-imagine Amazon's advertising stack from the ground up, leveraging GenAI to deliver solutions that scale across 20+ marketplaces from day one.
US, WA, Seattle
Unleash Your Potential as an AI Trailblazer At Amazon, we're on a mission to revolutionize the way people discover and access information. Our Applied Science team is at the forefront of this endeavor, pushing the boundaries of recommender systems and information retrieval. We're seeking brilliant minds to join us as interns and contribute to the development of cutting-edge AI solutions that will shape the future of personalized experiences. As an Applied Science Intern focused on Recommender Systems and Information Retrieval in Machine Learning, you'll have the opportunity to work alongside renowned scientists and engineers, tackling complex challenges in areas such as deep learning, natural language processing, and large-scale distributed systems. Your contributions will directly impact the products and services used by millions of Amazon customers worldwide. Imagine a role where you immerse yourself in groundbreaking research, exploring novel machine learning models for product recommendations, personalized search, and information retrieval tasks. You'll leverage natural language processing and information retrieval techniques to unlock insights from vast repositories of unstructured data, fueling the next generation of AI applications. Throughout your journey, you'll have access to unparalleled resources, including state-of-the-art computing infrastructure, cutting-edge research papers, and mentorship from industry luminaries. This immersive experience will not only sharpen your technical skills but also cultivate your ability to think critically, communicate effectively, and thrive in a fast-paced, innovative environment where bold ideas are celebrated. Join us at the forefront of applied science, where your contributions will shape the future of AI and propel humanity forward. Seize this extraordinary opportunity to learn, grow, and leave an indelible mark on the world of technology. Must be eligible and available for a full-time (40h / week) 12 week internship between May 2026 and September 2026 Amazon has positions available for Machine Learning Applied Science Internships in, but not limited to Arlington, VA; Bellevue, WA; Boston, MA; New York, NY; Palo Alto, CA; San Diego, CA; Santa Clara, CA; Seattle, WA. Key job responsibilities We are particularly interested in candidates with expertise in: Knowledge Graphs and Extraction, Programming/Scripting Languages, Time Series, Machine Learning, Natural Language Processing, Deep Learning,Neural Networks/GNNs, Large Language Models, Data Structures and Algorithms, Graph Modeling, Collaborative Filtering, Learning to Rank, Recommender Systems In this role, you'll collaborate with brilliant minds to develop innovative frameworks and tools that streamline the lifecycle of machine learning assets, from data to deployed models in areas at the intersection of Knowledge Management within Machine Learning. You will conduct groundbreaking research into emerging best practices and innovations in the field of ML operations, knowledge engineering, and information management, proposing novel approaches that could further enhance Amazon's machine learning capabilities. The ideal candidate should possess the ability to work collaboratively with diverse groups and cross-functional teams to solve complex business problems. A successful candidate will be a self-starter, comfortable with ambiguity, with strong attention to detail and the ability to thrive in a fast-paced, ever-changing environment. A day in the life - Design, implement, and experimentally evaluate new recommendation and search algorithms using large-scale datasets - Develop scalable data processing pipelines to ingest, clean, and featurize diverse data sources for model training - Conduct research into the latest advancements in recommender systems, information retrieval, and related machine learning domains - Collaborate with cross-functional teams to integrate your innovative solutions into production systems, impacting millions of Amazon customers worldwide - Communicate your findings through captivating presentations, technical documentation, and potential publications, sharing your knowledge with the global AI community
US, WA, Seattle
Shape the Future of Visual Intelligence Are you passionate about pushing the boundaries of computer vision and shaping the future of visual intelligence? Join Amazon and embark on an exciting journey where you'll develop cutting-edge algorithms and models that power our groundbreaking computer vision services, including Amazon Rekognition, Amazon Go, Visual Search, and more! At Amazon, we're combining computer vision, mobile robots, advanced end-of-arm tooling, and high-degree of freedom movement to solve real-world problems at an unprecedented scale. As an intern, you'll have the opportunity to build innovative solutions where visual input helps customers shop, anticipate technological advances, work with leading-edge technology, focus on highly targeted customer use-cases, and launch products that solve problems for Amazon customers worldwide. Throughout your journey, you'll have access to unparalleled resources, including state-of-the-art computing infrastructure, cutting-edge research papers, and mentorship from industry luminaries. This immersive experience will not only sharpen your technical skills but also cultivate your ability to think critically, communicate effectively, and thrive in a fast-paced, innovative environment where bold ideas are celebrated. Join us at the forefront of applied science, where your contributions will shape the future of AI and propel humanity forward. Seize this extraordinary opportunity to learn, grow, and leave an indelible mark on the world of technology. Amazon has positions available for Computer Vision Applied Science Internships in, but not limited to, Arlington, VA; Boston, MA; Cupertino, CA; Minneapolis, MN; New York, NY; Portland, OR; Santa Clara, CA; Seattle, WA; Bellevue, WA; Santa Clara, CA; Sunnyvale, CA. Key job responsibilities We are particularly interested in candidates with expertise in: Vision - Language Models, Object Recognition/Detection, Computer Vision, Large Language Models (LLMs), Programming/Scripting Languages, Facial Recognition, Image Retrieval, Deep Learning, Ranking, Video Understanding, Robotics In this role, you will work alongside global experts to develop and implement novel, scalable algorithms and modeling techniques that advance the state-of-the-art in areas of visual intelligence. You will tackle challenging, groundbreaking research problems to help build solutions where visual input helps the customers shop, anticipate technological advances, work with leading edge technology, focus on highly targeted customer use-cases, and launch products that solve problems for Amazon customers. The ideal candidate should possess the ability to work collaboratively with diverse groups and cross-functional teams to solve complex business problems. A successful candidate will be a self-starter, comfortable with ambiguity, with strong attention to detail and the ability to thrive in a fast-paced, ever-changing environment. A day in the life - Collaborate with Amazon scientists and cross-functional teams to develop and deploy cutting edge computer vision solutions into production. - Dive into complex challenges, leveraging your expertise in areas such as Vision-Language Models, Object Recognition/Detection, Large Language Models (LLMs), Facial Recognition, Image Retrieval, Deep Learning, Ranking, Video Understanding, and Robotics. - Contribute to technical white papers, create technical roadmaps, and drive production-level projects that will support Amazon Science. - Embrace ambiguity, strong attention to detail, and a fast-paced, ever-changing environment as you own the design and development of end-to-end systems. - Engage in knowledge-sharing, mentorship, and career-advancing resources to grow as a well-rounded professional.