Latency from post-quantum cryptography shrinks as data increases

Using time to last byte — rather than time to first byte — to assess the effects of data-heavy TLS 1.3 on real-world connections yields more encouraging results.

The risk that a quantum computer might break cryptographic standards widely used today has ignited numerous efforts to standardize quantum-resistant algorithms and introduce them into transport encryption protocols like TLS 1.3. The choice of post-quantum algorithm will naturally affect TLS 1.3’s performance. So far, studies of those effects have focused on the “handshake time” required for two parties to establish a quantum-resistant encrypted connection, known as the time to first byte.

Although these studies have been important in quantifying increases in handshake time, they do not provide a full picture of the effect of post-quantum cryptography on real-world TLS 1.3 connections, which often carry sizable amounts of data. At the 2024 Workshop on Measurements, Attacks, and Defenses for the Web (MADweb), we presented a paper advocating time to last byte (TTLB) as a metric for assessing the total impact of data-heavy, quantum-resistant algorithms such as ML-KEM and ML-DSA on real-world TLS 1.3 connections. Our paper shows that the new algorithms will have a much lower net effect on connections that transfer sizable amounts of data than they do on the TLS 1.3 handshake itself.

Post-quantum cryptography

TLS 1.3, the latest version of the transport layer security protocol, is used to negotiate and establish secure channels that encrypt and authenticate data passing between a client and a server. TLS 1.3 is used in numerous Web applications, including e-banking and streaming media.

Related content
Prize honors Amazon senior principal scientist and Penn professor for a protocol that achieves a theoretical limit on information-theoretic secure multiparty computation.

Asymmetric cryptographic algorithms, such as the one used in TLS 1.3, depend for their security on the difficulty of the discrete-logarithm or integer factorization problems, which a cryptanalytically relevant quantum computer could solve efficiently. The US National Institute of Standards and Technology (NIST) has been working on standardizing quantum-resistant algorithms and has selected ML-Key Encapsulation Mechanism (KEM) for key exchange. NIST has also selected ML-DSA for signatures, or cryptographic authentication.

As these algorithms have kilobyte-size public keys, ciphertexts, and signatures — versus the 50- to 400-byte sizes of the existing algorithms — they would inflate the amount of data exchanged in a TLS handshake. A number of works have compared handshake time using traditional TLS 1.3 key exchange and authentication to that using post-quantum (PQ) key exchange and authentication.

These comparisons were useful to quantify the overhead that each new algorithm introduces to the time to first byte, or completion of the handshake protocol. But they ignored the data transfer time over the secure connection that, together with the handshake time, constitutes the total delay before the application can start processing data. The total time from the start of the connection to the end of data transfer is, by contrast, the time to last byte (TTLB). How much TTLB slowdown is acceptable depends highly on the application.

Experiments

We designed our experiments to simulate various network conditions and measured the TTLB of classical and post-quantum algorithms in TLS 1.3 connections where the client makes a small request and the server responds with hundreds of kilobytes (KB) of data. We used Linux namespaces in a Ubuntu 22.04 virtual-machine instance. The namespaces were interconnected using virtual ethernet interfaces. To emulate the “network” between the namespaces, we used the Linux kernel’s netem utility, which can introduce variable network delays, bandwidth fluctuations, and packet loss between the client and server.

A standard AWS EC2 instance icon (which looks like a stylized integrated circuit) in which a netem emulation is running, with an emulated cloud server (represented by cloud icon) passing data back and forth with a server namespace (represented by a server-stack icon) and a client namespace (represented by a desktop-computer icon).
The experimental setup, with client and server Linux namespaces and netem-emulated network conditions.

Our experiments had several configurable parameters that allowed us to compare the effect of the PQ algorithm on TTLB under stable, unstable, fast, and slow network conditions:

  • TLS key exchange mechanism (classical ECDH or ECDH+ML-KEM post-quantum hybrid)
  • TLS certificate chain size corresponding to classical RSA or ML-DSA certificates.
  • TCP initial congestion window (initcwnd)
  • Network delay between client and server, or round-trip time (RTT)
  • Bandwidth between client and server
  • Loss probability per packet
  • Amount of data transferred from the server to the client

Results

The results of our testing are thoroughly analyzed in the paper. They essentially show that a few extra KB in the TLS 1.3 handshake due to the post-quantum public keys, ciphertexts, and signatures will not be noticeable in connections transferring hundreds of KB or more. Connections that transfer less than 10-20 KB of data will probably be more affected by the new data-heavy handshakes.

PQTLS fig. 1.png
Figure 1: Percentage increase in TLS 1.3 handshake time between traditional and post-quantum TLS 1.3 connections. Bandwidth = 1Mbps; loss probability = 0%, 1%, 3%, and 10%; RTT = 35ms and 200ms; TCP initcwnd=20.
A bar graph whose y-axis is "handshake time % increase" and whose x-axis is a sequence of percentiles (50th, 75th, and 90th). At each percentile are two bars, one blue (for the traditional handshake protocol) and one orange (for post-quantum handshakes). In all three instances, the orange bar is around twice as high as the blue one.

Figure 1 shows the percentage increase in the duration of the TLS 1.3 handshake for the 50th, 75th, and 90th percentiles of the aggregate datasets collected for 1Mbps bandwidth; 0%, 1%, 3%, and 10% loss probability; and 35-millisecond and 200-millisecond RTT. We can see that the ML-DSA size (16KB) certificate chain takes almost twice as much time as the 8KB chain. This means that if we manage to keep the volume of ML-DSA authentication data low, it would significantly benefit the speed of post-quantum handshakes in low-bandwidth connections.

A line graph whose y-axis is the time-to-last-byte (TTLB) percentage increase and whose x-axis is the size of the data files transmitted over the secure connection, ranging from 0 KiB to 200 KiB. There are three lines, representing the 50th, 75th, and 90th percentiles. They start at almost the same value and all drop precipitously from 0 KiB to 50 KiB, continuing to decline from 50 KiB to 200 KiB, with the 90th-percentile line declining slightly more rapidly than the other two.
Figure 2: Percentage increase in TTLB between existing and post-quantum TLS 1.3 connections at 0% loss probability. Bandwidth = 1Gbps; RTT = 35ms; TCP initcwnd = 20.

Figure 2 shows the percentage increase in the duration of the post-quantum handshake relative to the existing algorithm for all percentiles and different data sizes at 0% loss and 1Gbps bandwidth. We can observe that although the slowdown is low (∼3%) at 0 kibibytes (KiB, or multiples of 1,024 bytes, the nearest power of 2 to 1,000) from the server (equivalent to the handshake), it drops even more (∼1%) as the data from the server increases. At the 90th percentile the slowdown is slightly lower.

A line graph whose y-axis is the time-to-last-byte (TTLB) percentage increase and whose x-axis is the size of the data files transmitted over the secure connection, ranging from 0 KiB to 200 KiB. There are three lines, representing the 50th, 75th, and 90th percentiles. They start at exactly the same value and all decline in lockstep, dropping precipitously from 0 KiB to 50 KiB and continuing a steady decline from 50 KiB to 200 KiB.
Figure 3: Percentage increase in TTLB between existing and post-quantum TLS 1.3 connections at 0% loss probability. Bandwidth = 1Mbps; RTT = 200ms; TCP initcwnd = 20.

Figure 3 shows the percentage increase in the TTLB between existing and post-quantum TLS 1.3 connections carrying 0-200KiB of data from the server for each percentile at 1Mbps bandwidth, 200ms RTT, and 0% loss probability. We can see that increases for the three percentiles are almost identical. They start high (∼33%) at 0KiB from the server, but as the data size from the server increases, they drop to ∼6% because the handshake data size is amortized over the connection.

A line graph whose y-axis is the time-to-last-byte (TTLB) percentage increase and whose x-axis is the size of the data files transmitted over the secure connection, ranging from 0 KiB to 200 KiB. There are three lines, representing the 50th, 75th, and 90th percentiles. The 50th-percentile line drops precipitously from 0 KiB to 50 KiB, declines more gradually from 50 to 100, then increases slightly from 100 to 200. The 90th-percentile line starts much lower but increases slightly to 50 KiB, before declining to 100 and 200. The 75th-percentile line starts lower still, declines to 100 KiB, the increases slightly from 100 to 200.
Figure 4: Percentage increase in TTLB between existing and post-quantum TLS 1.3 connections. Loss = 10%; bandwidth = 1Mbps; RTT = 200ms; TCP initcwnd = 20.
Related content
Amazon is helping develop standards for post-quantum cryptography and deploying promising technologies for customers to experiment with.

Figure 4 shows the percentage increase in TTLB between existing and post-quantum TLS 1.3 connections carrying 0-200 KiB of data from the server for each percentile at 1Mbps bandwidth, 200ms RTT, and 10% loss probability. It shows that at 10% loss, the TTLB increase settles between 20-30% for all percentiles. The same experiments for 35ms RTT produced similar results. Although a 20-30% increase may seem high, we note that re-running the experiments could sometimes lead to smaller or higher percentage increases because of the general network instability of the scenario. Also, bear in mind that TTLBs for the existing algorithm at 200KiB from the server, 200ms RTT, and 10% loss were 4,644ms, 7,093ms, and 10,178ms, whereas their post-quantum-connection equivalents were 6,010ms, 8,883ms, and 12,378ms. At 0% loss they were 2,364ms, 2,364ms, and 2,364ms. So, although the TTLBs for the post-quantum connections increased by 20-30% relative to the conventional connections, the conventional connections are already impaired (by 97-331%) due to network loss. An extra 20-30% is not likely to make much difference in an already highly degraded connection time.

A line graph whose y-axis is the time-to-last-byte (TTLB) percentage increase and whose x-axis is the size of the data files transmitted over the secure connection, ranging from 0 KiB to 200 KiB. There are three lines, representing the 50th, 75th, and 90th percentiles. They start at different values but all decline precipitously from 0 KiB to 50 KiB. From 50KiB to 100 KiB, the 75th-percentile line and the 50th-percentile line continue to decline, but the 90th-percentile line increases slightly. All three increase slightly between 100 KiB and 200.
Figure 5: Percentage increase in TTLB between existing and post-quantum TLS 1.3 connections for 0% loss probability under “volatile network” conditions. Bandwidth = 1Gbps; RTT = 35ms; TCP initcwnd = 20.

Figure 5 shows the percentage increase in TTLB between existing and post-quantum TLS 1.3 connections for 0% loss probability and 0-200KiB data sizes transferred from the server. To model a highly volatile RTT, we used a Pareto-normal distribution with a mean of 35ms and 35/4ms jitter. We can see that the increase in post-quantum connection TTLB starts high at 0KiB server data and drops to 4-5%. As with previous experiments, the percentages were more volatile the higher the loss probabilities, but overall, the results show that even under “volatile network conditions” the TTLB drops to acceptable levels as the amount of transferred data increases.

A line graph whose y-axis is the cumulative distribution function (CDF), from 0.0 to 1.0, and whose x-axis is time to last byte (TTLB) in milliseconds. There are five differently colored lines. The first four all have the same round-trip time. Two of them have bandwidth of 1Gbps and two bandwidth of 1Mbps. Within each bandwidth tier, the two lines represent 0% and 5% loss. The fifth line is Pareto-normal round-trip time. The high-bandwidth lines and the Pareto-normal line all begin near the origin. The high-bandwidth, low-loss line is almost vertical, reaching 1.0 almost immediately. The high-bandwidth, high-loss line and Pareto-normal line look like offsets of each other, with the Pareto-normal line increasing at a slightly lower rate; both rise fairly quickly, reaching 0.8 at about 1,000 milliseconds. The low-bandwidth lines both begin at TTLB values of of about 2,000. Again, the low-loss line is almost vertical; the higher-loss line rises at a slower rate.
Figure 6: TTLB cumulative distribution function for post-quantum TLS 1.3 connections. 200KiB from the server; RTT = 35ms; TCP initcwnd = 20.

To confirm the volatility under unstable network conditions, we used the TTLB cumulative distribution function (CDF) for post-quantum TLS 1.3 connections transferring 200KiB from the server (figure 6). We observe that under all types of volatile conditions (1Gbps and 5% loss, 1Mbps and 10% loss, Pareto-normal distributed network delay), the TTLB increases very early in the experimental measurement sample, which demonstrates that the total connection times are highly volatile. We made the same observation with TLS 1.3 handshake times under unstable network conditions.

Conclusion

This work demonstrated that the practical effect of data-heavy, post-quantum algorithms on TLS 1.3 connections is lower than their effect on the handshake itself. Low-loss, low- or high-bandwidth connections will see little impact from post-quantum handshakes when transferring sizable amounts of data. We also showed that although the effects of PQ handshakes could vary under unstable conditions with higher loss rates or high-variability delays, they stay within certain limits and drop as the total amount of transferred data increases. Additionally, we saw that unstable connections inherently provide poor completion times; a small latency increase due to post-quantum handshakes would not render them less usable than before. This does not mean that trimming the amount of handshake data is undesirable, especially if little application data is sent relative to the size of the handshake messages.

For more details, please see our paper.

Related content

IN, KA, Bengaluru
Interested to build the next generation Financial systems that can handle billions of dollars in transactions? Interested to build highly scalable next generation systems that could utilize Amazon Cloud? Massive data volume + complex business rules in a highly distributed and service oriented architecture, a world class information collection and delivery challenge. Our challenge is to deliver the software systems which accurately capture, process, and report on the huge volume of financial transactions that are generated each day as millions of customers make purchases, as thousands of Vendors and Partners are paid, as inventory moves in and out of warehouses, as commissions are calculated, and as taxes are collected in hundreds of jurisdictions worldwide. Key job responsibilities • Understand the business and discover actionable insights from large volumes of data through application of machine learning, statistics or causal inference. • Analyse and extract relevant information from large amounts of Amazon’s historical transactions data to help automate and optimize key processes • Research, develop and implement novel machine learning and statistical approaches for anomaly, theft, fraud, abusive and wasteful transactions detection. • Use machine learning and analytical techniques to create scalable solutions for business problems. • Identify new areas where machine learning can be applied for solving business problems. • Partner with developers and business teams to put your models in production. • Mentor other scientists and engineers in the use of ML techniques. A day in the life • Understand the business and discover actionable insights from large volumes of data through application of machine learning, statistics or causal inference. • Analyse and extract relevant information from large amounts of Amazon’s historical transactions data to help automate and optimize key processes • Research, develop and implement novel machine learning and statistical approaches for anomaly, theft, fraud, abusive and wasteful transactions detection. • Use machine learning and analytical techniques to create scalable solutions for business problems. • Identify new areas where machine learning can be applied for solving business problems. • Partner with developers and business teams to put your models in production. • Mentor other scientists and engineers in the use of ML techniques. About the team The FinAuto TFAW(theft, fraud, abuse, waste) team is part of FGBS Org and focuses on building applications utilizing machine learning models to identify and prevent theft, fraud, abusive and wasteful(TFAW) financial transactions across Amazon. Our mission is to prevent every single TFAW transaction. As a Machine Learning Scientist in the team, you will be driving the TFAW Sciences roadmap, conduct research to develop state-of-the-art solutions through a combination of data mining, statistical and machine learning techniques, and coordinate with Engineering team to put these models into production. You will need to collaborate effectively with internal stakeholders, cross-functional teams to solve problems, create operational efficiencies, and deliver successfully against high organizational standards.
IN, KA, Bengaluru
Interested to build the next generation Financial systems that can handle billions of dollars in transactions? Interested to build highly scalable next generation systems that could utilize Amazon Cloud? Massive data volume + complex business rules in a highly distributed and service oriented architecture, a world class information collection and delivery challenge. Our challenge is to deliver the software systems which accurately capture, process, and report on the huge volume of financial transactions that are generated each day as millions of customers make purchases, as thousands of Vendors and Partners are paid, as inventory moves in and out of warehouses, as commissions are calculated, and as taxes are collected in hundreds of jurisdictions worldwide. Key job responsibilities • Understand the business and discover actionable insights from large volumes of data through application of machine learning, statistics or causal inference. • Analyse and extract relevant information from large amounts of Amazon’s historical transactions data to help automate and optimize key processes • Research, develop and implement novel machine learning and statistical approaches for anomaly, theft, fraud, abusive and wasteful transactions detection. • Use machine learning and analytical techniques to create scalable solutions for business problems. • Identify new areas where machine learning can be applied for solving business problems. • Partner with developers and business teams to put your models in production. • Mentor other scientists and engineers in the use of ML techniques. A day in the life • Understand the business and discover actionable insights from large volumes of data through application of machine learning, statistics or causal inference. • Analyse and extract relevant information from large amounts of Amazon’s historical transactions data to help automate and optimize key processes • Research, develop and implement novel machine learning and statistical approaches for anomaly, theft, fraud, abusive and wasteful transactions detection. • Use machine learning and analytical techniques to create scalable solutions for business problems. • Identify new areas where machine learning can be applied for solving business problems. • Partner with developers and business teams to put your models in production. • Mentor other scientists and engineers in the use of ML techniques. About the team The FinAuto TFAW(theft, fraud, abuse, waste) team is part of FGBS Org and focuses on building applications utilizing machine learning models to identify and prevent theft, fraud, abusive and wasteful(TFAW) financial transactions across Amazon. Our mission is to prevent every single TFAW transaction. As a Machine Learning Scientist in the team, you will be driving the TFAW Sciences roadmap, conduct research to develop state-of-the-art solutions through a combination of data mining, statistical and machine learning techniques, and coordinate with Engineering team to put these models into production. You will need to collaborate effectively with internal stakeholders, cross-functional teams to solve problems, create operational efficiencies, and deliver successfully against high organizational standards.
IN, KA, Bengaluru
Amazon Health Services (One Medical) About Us: At Health AI, we're revolutionizing healthcare delivery through innovative AI-enabled solutions. As part of Amazon Health Services and One Medical, we're on a mission to make quality healthcare more accessible while improving patient outcomes. Our work directly impacts millions of lives by empowering patients and enabling healthcare providers to deliver more meaningful care. Role Overview: We're seeking an Applied Scientist to join our dynamic team in building state of the art AI/ML solutions for healthcare. This role offers a unique opportunity to work at the intersection of artificial intelligence and healthcare, developing solutions that will shape the future of medical services delivery. Key job responsibilities • Lead end-to-end development of AI/ML solutions for Amazon Health organization, including Amazon Pharmacy and One Medical • Research, design, and implement state-of-the-art machine learning models, with a focus on Large Language Models (LLMs) and Visual Language Models (VLMs) • Optimize and fine-tune models for production deployment, including model distillation for improved latency • Drive scientific innovation while maintaining a strong focus on practical business outcomes • Collaborate with cross-functional teams to translate complex technical solutions into tangible customer benefits • Contribute to the broader Amazon Health scientific community and help shape our technical roadmap
US, CA, Pasadena
The Amazon Center for Quantum Computing in Pasadena, CA, is looking to hire an Applied Scientist specializing in Mixed-Signal Design. Working alongside other scientists and engineers, you will design and validate hardware performing the control and readout functions for AWS quantum processors. Candidates must have a solid background in mixed-signal design at the printed circuit board (PCB) level. Working effectively within a cross-functional team environment is critical. The ideal candidate will have demonstrated the capability to contribute to all phases of product life cycle development, from requirements gathering to verification. Diverse Experiences Amazon values diverse experiences. Even if you do not meet all of the preferred qualifications and skills listed in the job description, we encourage candidates to apply. If your career is just starting, hasn’t followed a traditional path, or includes alternative experiences, don’t let it stop you from applying. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Inclusive Team Culture Here at Amazon, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences, inspire us to never stop embracing our uniqueness. Mentorship and Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Key job responsibilities Our scientists and engineers collaborate across diverse teams and projects to offer state of the art, cost effective solutions for the control of Amazon quantum processor systems. You’ll bring a passion for innovation, collaboration, and mentoring to: Solve layered technical problems, often ones not encountered before, across our hardware stack. Develop requirements with key system stakeholders, including quantum device, test and measurement, and cryogenic hardware teams. Design, implement, test, deploy, and maintain innovative solutions that meet both strict performance and cost metrics. Research enabling control system technologies necessary for Amazon to produce commercially viable quantum computers.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, CA, San Francisco
Amazon launched the AGI Lab to develop foundational capabilities for useful AI agents. We built Nova Act - a new AI model trained to perform actions within a web browser. The team builds AI/ML infrastructure that powers our production systems to run performantly at high scale. We’re also enabling practical AI to make our customers more productive, empowered, and fulfilled. In particular, our work combines large language models (LLMs) with reinforcement learning (RL) to solve reasoning, planning, and world modeling in both virtual and physical environments. Our lab is a small, talent-dense team with the resources and scale of Amazon. Each team in the lab has the autonomy to move fast and the long-term commitment to pursue high-risk, high-payoff research. We’re entering an exciting new era where agents can redefine what AI makes possible. We’d love for you to join our lab and build it from the ground up! Key job responsibilities This role will lead a team of SDEs building AI agents infrastructure from launch to scale. The role requires the ability to span across ML/AI system architecture and infrastructure. You will work closely with application developers and scientists to have a impact on the Agentic AI industry. We're looking for a Software Development Manager who is energized by building high performance systems, making an impact and thrives in fast-paced, collaborative environments. About the team Check out the Nova Act tools our team built on on nova.amazon.com/act
US, CA, Santa Clara
Amazon Quick Suite is an enterprise AI platform that transforms how organizations work with their data and knowledge. Combining generative AI-powered search, deep research capabilities, intelligent agents and automations, and comprehensive business intelligence, Quick Suite serves tens of thousands of users. Our platform processes thousands of queries monthly, helping teams make faster, data-driven decisions while maintaining enterprise-grade security and governance. From natural language interactions with complex datasets to automated workflows and custom AI agents, Quick Suite is redefining workplace productivity at unprecedented scale. We are seeking a Data Scientist II to join our Quick Data team, focusing on evaluation and benchmarking data development for Quick Suite features, with particular emphasis on Research and other generative AI capabilities. Our mission is to engineer high-quality datasets that are essential to the success of Amazon Quick Suite. From human evaluations and Responsible AI safeguards to Retrieval-Augmented Generation and beyond, our work ensures that Generative AI is enterprise-ready, safe, and effective for users at scale. As part of our diverse team—including data scientists, engineers, language engineers, linguists, and program managers—you will collaborate closely with science, engineering, and product teams. We are driven by customer obsession and a commitment to excellence. Key job responsibilities In this role, you will leverage data-centric AI principles to assess the impact of data on model performance and the broader machine learning pipeline. You will apply Generative AI techniques to evaluate how well our data represents human language and conduct experiments to measure downstream interactions. Specific responsibilities include: * Design and develop comprehensive evaluation and benchmarking datasets for Quick Suite AI-powered features * Leverage LLMs for synthetic data corpora generation; data evaluation and quality assessment using LLM-as-a-judge settings * Create ground truth datasets with high-quality question-answer pairs across diverse domains and use cases * Lead human annotation initiatives and model evaluation audits to ensure data quality and relevance * Develop and refine annotation guidelines and quality frameworks for evaluation tasks * Conduct statistical analysis to measure model performance, identify failure patterns, and guide improvement strategies * Collaborate with ML scientists and engineers to translate evaluation insights into actionable product improvements * Build scalable data pipelines and tools to support continuous evaluation and benchmarking efforts * Contribute to Responsible AI initiatives by developing safety and fairness evaluation datasets About the team Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Hybrid Work We value innovation and recognize this sometimes requires uninterrupted time to focus on a build. We also value in-person collaboration and time spent face-to-face. Our team affords employees options to work in the office every day or in a flexible, hybrid work model near one of our U.S. Amazon offices.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.