The 10 most viewed blog posts of 2025

From quantum computing breakthroughs and foundation models for robotics to the evolution of Amazon Aurora and advances in agentic AI, these are the posts that captured readers' attention in 2025.

  1. Time series forecasting has undergone a transformation with the emergence of foundation models, moving beyond traditional statistical methods that extrapolate from single time series. Building on the success of the original Chronos models — which have been downloaded over 600 million times from Hugging Face — Amazon researchers introduce Chronos-2, designed to handle arbitrary forecasting tasks in a zero-shot manner through in-context learning (ICL).

    Chronos-2 pipeline
    The complete Chronos-2 pipeline. Input time series (targets and covariates) are first normalized using a robust scaling scheme, after which a time index and mask meta features are added. The resulting sequences are split into non-overlapping patches and mapped to high-dimensional embeddings via a residual network. The core transformer stack operates on these patch embeddings and produces multi-patch quantile outputs corresponding to the future patches masked out in the input. Each transformer block alternates between time and group attention layers: the time attention layer aggregates information across patches within a single time series, while the group attention layer aggregates information across all series within a group at each patch index. The figure illustrates two multivariate time series with one known covariate each, with corresponding groups highlighted in blue and red. This example is for illustration purposes only; Chronos-2 supports arbitrary numbers of targets and optional covariates.

    Unlike its predecessors, which supported only univariate forecasting, Chronos-2 can jointly predict multiple coevolving time series (multivariate forecasting) and incorporate external factors like promotional schedules or weather conditions (covariate-informed forecasting). For example, cloud operations teams can forecast CPU usage, memory consumption, and storage I/O together, while retailers can factor in planned promotions when predicting demand. The model's group attention mechanism enables it to capture complex interactions between variables, making it particularly valuable for cold-start scenarios where limited historical data is available.

  2. Quantum computing has long promised exponentially faster computation for certain problems, but quantum devices’ extreme sensitivity to environmental noise has limited practical applications. Amazon Web Services' new Ocelot chip represents a breakthrough in addressing this challenge. Ocelot uses bosonic quantum error correction based on "cat qubits", named after Schrödinger's famous thought experiment.

    1920x1080_Ocelot.jpg
    The pair of silicon microchips that compose the Ocelot logical-qubit memory chip.

    Traditional quantum error correction methods require thousands of physical qubits per logical qubit to achieve usable error rates, creating an enormous resource overhead. Ocelot's innovative architecture exponentially suppresses bit-flip errors at the physical level while using a simple repetition code to correct phase-flip errors. This approach achieves bit-flip times approaching one second — more than a thousand times longer than conventional superconducting qubits — while maintaining phase-flip times sufficient for error correction. The result is a distance-5 error-correcting code requiring only nine qubits total, versus 49 qubits for equivalent surface code implementations.

  3. As agentic AI systems move from concept to reality, fundamental scientific questions emerge about how these systems should share information and interact strategically. Amazon Scholar Michael Kearns explores several research frontiers that will shape the development of AI agents capable of acting autonomously on users' behalf.

    One intriguing question is what language agents will speak to each other. While agents must communicate with humans in natural language, agent-to-agent communication might be more efficient in the native "language" of neural networks: embeddings, where meanings are represented as vectors in a representational space. Just as websites today offer content in multiple human languages, we may see an "agentic Web" where content is pretranslated into standardized embeddings.

    Restaurant embeddings.jpg
    In this example, the red, green, and blue points are three-dimensional embeddings of restaurants at which three people (Alice, Bob, and Chris) have eaten. (A real-world embedding, by contrast, might have hundreds of dimensions.) Each glowing point represents the center of one of the clusters, and its values summarize the restaurant preferences of the corresponding person. AI agents could use such vector representations, rather than text, to share information with each other.

    Context sharing presents another challenge: agents must balance the benefits of sharing working memory with privacy concerns. When your travel agent negotiates with a hotel booking service, how much context about your preferences should it share — and how much financial information should it withhold?

  4. Inspired by how large language models are trained on diverse text corpora, Amazon researchers developed Mitra, a tabular foundation model pretrained entirely on synthetic datasets. While this may seem counterintuitive, real-world tabular data is often limited and heterogeneous, making it more practical to simulate diverse patterns that cover a wide range of possible data distributions.

    The key insight behind Mitra is that the quality of synthetic priors determines how well the model generalizes. Effective priors yield good performance on real tasks, exhibit diversity to prevent overfitting, and offer unique patterns not found elsewhere. Mitra's training mixture includes structural causal models — which combine graphs of causal dependencies with probabilistic equations — and popular tree-based methods like gradient boosting, random forests, and decision trees.

    Mitra overview.png
    Overview of the Mitra framework. We pretrain tabular foundation models (TFMs) on a mixture of synthetic data priors, including structural causal models and tree-based models. Each dataset is split into support and query sets. Mitra supports both 2-D attention across rows and columns and 1-D row-wise attention. At inference, the model conditions on support examples from real datasets to predict query labels using in-context learning (ICL) without gradient updates.

    Released as part of AutoGluon 1.4, Mitra demonstrates state-of-the-art performance through in-context learning: it can predict labels for new datasets when conditioned on a moderate number of examples, without requiring gradient updates or task-specific training.

  5. When Amazon Aurora launched in 2015, it promised to combine the cost effectiveness of MySQL with the performance of high-end commercial databases. The key innovation was decoupling computation from storage, a departure from traditional database architectures.

    By moving durability concerns to a separate, purpose-built storage service and offloading caching and logging layers to a scale-out, self-healing system, Aurora addressed the central constraint in cloud computing: the network. This service-oriented architecture protects databases from performance variance and failures while enabling independent scaling of performance, availability, and durability.

    Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases
    In their 2017 paper, Amazon researchers describe the architecture of Amazon Aurora.

    Over the past decade, Aurora has continued to evolve. Aurora Serverless, introduced in 2018, brought on-demand autoscaling that lets customers adjust computational capacity based on workload needs, using sophisticated resource management techniques including oversubscription, reactive control, and distributed decision making. As of May 2025, all Aurora offerings are now serverless: customers no longer need to choose specific server types or worry about underlying hardware, patching, or backups.

  6. Converting unstructured or poorly structured data into clean, schema-compliant records is a critical task across domains from healthcare to e-commerce. While large language models can perform this task when prompted with schema specifications, this approach has drawbacks: high costs at scale, complex prompts, and limitations on the complexity of the schemas.

    In a pair of recent papers, Amazon researchers introduced SoLM (the structured-object language model), a lightweight specialized model trained to generate objects in specific schemas using a novel self-supervised denoising method. Rather than training SoLM on clean examples, the researchers take existing structured records, introduce artificial noise, and train the model to recover the original forms. By making the noise increasingly aggressive — even completely removing structure or randomly shuffling tokens — the researchers enhance the model’s quality and teach it to operate on completely unstructured input.

    A key innovation is confidence-aware substructure beam search (CABS), which applies beam search at the level of key-value pairs rather than individual tokens, using a separately trained confidence model to predict each pair's probability. This approach dramatically improves accuracy while mitigating hallucination risks.

  7. Traditional embedding-based information retrieval compares a query vector to every possible response vector in a database, a time-consuming process as datasets grow. Amazon's GENIUS (generative universal multimodal search) model takes a different approach: instead of comparing vectors, it uses input queries to directly generate ID codes for data items.

    Comparison of search methods.png
    With embedding-based retrieval (a), a text embedding must be compared to every possible image embedding, or vice versa. With generative retrieval (b and c), by contrast, a retrieval model generates a single ID for each query. With GENIUS (c), the first digit of the ID code indicates the modality of the output.

    Presented at CVPR 2025, GENIUS is a multimodal model whose inputs and outputs can be any combination of images, texts, or image-text pairs. Two key innovations enable GENIUS's performance. The first is semantic quantization, where IDs are generated piecemeal, with each new ID segment focusing in more precisely on the target data item's location in the representational space. The second is query augmentation, which generates additional training queries by interpolating between initial queries and target IDs in the representational space, helping the model generalize to new data types.

  8. Foundation models have transformed language and computer vision, but their adoption in scientific domains like computational fluid dynamics has been slower. What will it take for them to play a more significant role in scientific applications?

    FoundationModels-AutoAirflow-16x9.general-purpose_caption.png
    A DrivAerML dataset surface plot of the normalized magnitude of wall shear stress (wall friction coefficient).

    To help answer this question, Amazon applied scientist Danielle Maddix Robinson explores foundation models’ application to time series forecasting, with both univariate and spatiotemporal data. Scientific foundation models face challenges that large language models don’t: severe data scarcity (since generating high-quality scientific data often requires expensive numerical simulations), the constraints of inviolable physical laws, and the need for robust uncertainty quantification in safety-critical applications.

    For univariate time series, Robinson and her colleagues address data scarcity with synthetic pretraining data. The resulting model demonstrated surprising strength on chaotic dynamical systems — not because it was designed for them but because of its ability to "parrot" past history without regressing to the mean, as classical methods do. For spatiotemporal forecasting in domains like weather prediction and aerodynamics, the researchers found important trade-offs between accuracy and memory consumption across different architectures, with some models better suited for short-term forecasts and others for long-term stability.

  9. Managing fleets of thousands of mobile robots in Amazon fulfillment centers requires predicting the robots’ future locations, to minimize congestion when assigning tasks and routes. But using the robots’ navigation algorithms to simulate their interactions faster than real time would be prohibitively resource intensive.

    Amazon's DeepFleet foundation models learn to predict robot locations from billions of hours of real-world navigation data collected from the million-plus robots deployed across Amazon fulfillment and sortation centers. Like language models that learn general competencies from diverse texts, DeepFleet learns general traffic flow patterns that enable it to quickly infer how situations will likely unfold and help assign tasks and route robots around congestion.

    Sample models of a fulfillment center (top) and a sortation center (bottom).
    Sample models of a fulfillment center (top) and a sortation center (bottom).

    Researchers experimented with four distinct model architectures — robot-centric, robot-floor, image-floor, and graph-floor — each offering a different answer to fundamental design questions: Should inputs represent individual robot states or whole-floor states? Should floor layouts be encoded as features, images, or graphs? How should time be handled?

  10. AI agents represent a leap forward in generative AI, a move from chat interfaces to systems that act autonomously on users' behalf — booking travel, making purchases, building software. But how do agentic systems actually work? Amazon vice president and distinguished engineer Marc Brooker demystifies the core components of agents and explains the design choices behind AWS's Bedrock AgentCore framework.

    At their heart, agents run models and tools in a loop to achieve goals. The user provides a goal; the agent uses an LLM to plan how to achieve it; and the agent repeatedly calls tools — databases, APIs, services — based on the model's instructions, updating its plan as it receives responses.

    But making such systems work in practice requires sophisticated infrastructure. AgentCore uses Firecracker microVMs to provide secure, efficient isolation for each agent session, with startup times measured in milliseconds and overhead as low as a few megabytes. The AgentCore Gateway service manages tool calls using standards like the model context protocol, translating between the LLM's outputs and tool input specifications. When no API exists for a needed action, Amazon's Nova Act enables computer use, letting agents interact with any website by pointing and clicking.

Related content

US, WA, Seattle
This role will contribute to developing the Economics and Science products and services in the Fee domain, with specialization in supply chain systems and fees. Through the lens of economics, you will develop causal links for how Amazon, Sellers and Customers interact. You will be a key and senior scientist, advising Amazon leaders how to price our services. You will work on developing frameworks and scalable, repeatable models supporting optimal pricing and policy in the two-sided marketplace that is central to Amazon's business. The pricing for Amazon services is complex. You will partner with science and technology teams across Amazon including Advertising, Supply Chain, Operations, Prime, Consumer Pricing, and Finance. We are looking for an experienced Economist to improve our understanding of seller Economics, enhance our ability to estimate the causal impact of fees, and work with partner teams to design pricing policy changes. In this role, you will provide guidance to scientists to develop econometric models to influence our fee pricing worldwide. You will lead the development of causal models to help isolate the impact of fee and policy changes from other business actions, using experiments when possible, or observational data when not. Key job responsibilities The ideal candidate will have extensive Economics knowledge, demonstrated strength in practical and policy relevant structural econometrics, strong collaboration skills, proven ability to lead highly ambiguous and large projects, and a drive to deliver results. They will work closely with Economists, Data / Applied Scientists, Strategy Analysts, Data Engineers, and Product leads to integrate economic insights into policy and systems production. Familiarity with systems and services that constitute seller supply chains is a plus but not required. About the team The Stores Economics and Sciences team is a central science team that supports Amazon's Retail and Supply Chain leadership. We tackle some of Amazon's most challenging economics and machine learning problems, where our mandate is to impact the business on massive scale.
US, WA, Bellevue
We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to apply their causal inference and/or structural econometrics skillsets to solve real world problems. The intern will work in the area of Economics Intelligence in Amazon Returns and Recommerce Technology and Innovation and develop new, data-driven solutions to support the most critical components of this rapidly scaling team. Our PhD Economist Internship Program offers hands-on experience in applied economics, supported by mentorship, structured feedback, and professional development. Interns work on real business and research problems, building skills that prepare them for full-time economist roles at Amazon and beyond. You will learn how to build data sets and perform applied econometric analysis collaborating with economists, scientists, and product managers. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. About the team The WWRR Economics Intelligence (RREI) team brings together Economists, Data Scientists, and Business Intelligence Engineers experts to delivers economic solutions focused on forecasting, causality, attribution, customer behavior for returns, recommerce, and sustainability domains.
US, WA, Bellevue
We are looking for detail-oriented, organized, and responsible individuals who are eager to learn how to apply their causal inference and/or structural econometrics skillsets to solve real world problems. The intern will work in the area of Economics Intelligence in Amazon Returns and Recommerce Technology and Innovation and develop new, data-driven solutions to support the most critical components of this rapidly scaling team. Our PhD Economist Internship Program offers hands-on experience in applied economics, supported by mentorship, structured feedback, and professional development. Interns work on real business and research problems, building skills that prepare them for full-time economist roles at Amazon and beyond. You will learn how to build data sets and perform applied econometric analysis collaborating with economists, scientists, and product managers. These skills will translate well into writing applied chapters in your dissertation and provide you with work experience that may help you with placement. These are full-time positions at 40 hours per week, with compensation being awarded on an hourly basis. About the team The WWRR Economics Intelligence (RREI) team brings together Economists, Data Scientists, and Business Intelligence Engineers experts to delivers economic solutions focused on forecasting, causality, attribution, customer behavior for returns, recommerce, and sustainability domains.
US, WA, Seattle
Innovators wanted! Are you an entrepreneur? A builder? A dreamer? This role is part of an Amazon Special Projects team that takes the company’s Think Big leadership principle to the next level. We focus on creating entirely new products and services with a goal of positively impacting the lives of our customers. No industries or subject areas are out of bounds. If you’re interested in innovating at scale to address big challenges in the world, this is the team for you. As a Research Scientist, you will work with a unique and gifted team developing exciting products for consumers and collaborate with cross-functional teams. Our team rewards intellectual curiosity while maintaining a laser-focus in bringing products to market. Competitive candidates are responsive, flexible, and able to succeed within an open, collaborative, entrepreneurial, startup-like environment. At the intersection of both academic and applied research in this product area, you have the opportunity to work together with some of the most talented scientists, engineers, and product managers. Here at Amazon, we embrace our differences. We are committed to furthering our culture of inclusion. We have thirteen employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We are constantly learning through programs that are local, regional, and global. Amazon’s culture of inclusion is reinforced within our 16 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust. Our team highly values work-life balance, mentorship and career growth. We believe striking the right balance between your personal and professional life is critical to life-long happiness and fulfillment. We care about your career growth and strive to assign projects and offer training that will challenge you to become your best.
US, WA, Seattle
Amazon has co-founded and signed The Climate Pledge, a commitment to reach net zero carbon by 2040. As a team, we leverage GenAI, sensors, smart home devices, cloud services, material science, and Alexa to build products that have a meaningful impact for customers and the climate. In alignment with this bold corporate goal, the Amazon Devices & Services organization is looking for a passionate, talented, and inventive Senior Applied Scientist to help build revolutionary products with potential for major societal impact. Great candidates for this position will have expertise in the areas of agentic AI applications, deep learning, time series analysis, LLMs, and multimodal systems. This includes experience designing autonomous AI agents that can reason, plan, and execute multi-step tasks, building tool-augmented LLM systems with access to external APIs and data sources, implementing multi-agent orchestration, and developing RAG architectures that combine LLMs with domain-specific knowledge bases. You will strive for simplicity and creativity, demonstrating high judgment backed by statistical proof. Key job responsibilities As a Senior Applied Scientist on the Energy Science team, you'll design and deploy agentic AI systems that autonomously analyze data, plan solutions, and execute recommendations. You'll build multi-agent architectures where specialized AI agents coordinate to solve complex optimization problems, and develop tool-augmented LLM applications that integrate with external data sources and APIs to deliver context-aware insights. Your work involves creating multimodal AI systems that synthesize diverse data streams, while implementing RAG pipelines that ground large language models in domain-specific knowledge bases. You'll apply advanced machine learning and deep learning techniques to time series analysis, forecasting, and pattern recognition. Beyond technical innovation, you'll drive end-to-end product development from research through production deployment, collaborating with cross-functional teams to translate AI capabilities into customer experiences. You'll establish rigorous experimentation frameworks to validate model performance and measure business impact, building AI-driven products with potential for major societal impact.
US, CA, San Francisco
Amazon launched the AGI Lab to develop foundational capabilities for useful AI agents. We built Nova Act - a new AI model trained to perform actions within a web browser. The team builds AI/ML infrastructure that powers our production systems to run performantly at high scale. We’re also enabling practical AI to make our customers more productive, empowered, and fulfilled. In particular, our work combines large language models (LLMs) with reinforcement learning (RL) to solve reasoning, planning, and world modeling in both virtual and physical environments. Our lab is a small, talent-dense team with the resources and scale of Amazon. Each team in the lab has the autonomy to move fast and the long-term commitment to pursue high-risk, high-payoff research. We’re entering an exciting new era where agents can redefine what AI makes possible. We’d love for you to join our lab and build it from the ground up! Key job responsibilities This role will lead a team of SDEs building AI agents infrastructure from launch to scale. The role requires the ability to span across ML/AI system architecture and infrastructure. You will work closely with application developers and scientists to have a impact on the Agentic AI industry. We're looking for a Software Development Manager who is energized by building high performance systems, making an impact and thrives in fast-paced, collaborative environments. About the team Check out the Nova Act tools our team built on on nova.amazon.com/act
US, WA, Seattle
MULTIPLE POSITIONS AVAILABLE Employer: AMAZON WEB SERVICES, INC. Offered Position: Applied Scientist III Job Location: Seattle, Washington Job Number: AMZ9674037 Position Responsibilities: Participate in the design, development, evaluation, deployment and updating of data-driven models and analytical solutions for machine learning (ML) and/or natural language (NL) applications. Develop and/or apply statistical modeling techniques (e.g. Bayesian models and deep neural networks), optimization methods, and other ML techniques to different applications in business and engineering. Routinely build and deploy ML models on available data, and run and analyze experiments in a production environment. Identify new opportunities for research in order to meet business goals. Research and implement novel ML and statistical approaches to add value to the business. Mentor junior engineers and scientists. Position Requirements: Master’s degree or foreign equivalent degree in Computer Science, Machine Learning, Engineering, or a related field and two years of research or work experience in the job offered, or as a Research Scientist, Research Assistant, Software Engineer, or a related occupation. Employer will accept a Bachelor’s degree or foreign equivalent degree in Computer Science, Machine Learning, Engineering, or a related field and five years of progressive post-baccalaureate research or work experience in the job offered or a related occupation as equivalent to the Master’s degree and two years of research or work experience. Must have one year of research or work experience in the following skill(s): (1) programming in Java, C++, Python, or equivalent programming language; and (2) conducting the analysis and development of various supervised and unsupervised machine learning models for moderately complex projects in business, science, or engineering. Amazon.com is an Equal Opportunity-Affirmative Action Employer – Minority / Female / Disability / Veteran / Gender Identity / Sexual Orientation. 40 hours / week, 8:00am-5:00pm, Salary Range $167,100/year to $226,100/year. Amazon is a total compensation company. Dependent on the position offered, equity, sign-on payments, and other forms of compensation may be provided as part of a total compensation package, in addition to a full range of medical, financial, and/or other benefits. For more information, visit: https://www.aboutamazon.com/workplace/employee-benefits.#0000
IN, KA, Bengaluru
Amazon Health Services (One Medical) About Us: At Health AI, we're revolutionizing healthcare delivery through innovative AI-enabled solutions. As part of Amazon Health Services and One Medical, we're on a mission to make quality healthcare more accessible while improving patient outcomes. Our work directly impacts millions of lives by empowering patients and enabling healthcare providers to deliver more meaningful care. Role Overview: We're seeking an Applied Scientist to join our dynamic team in building state of the art AI/ML solutions for healthcare. This role offers a unique opportunity to work at the intersection of artificial intelligence and healthcare, developing solutions that will shape the future of medical services delivery. Key job responsibilities • Lead end-to-end development of AI/ML solutions for Amazon Health organization, including Amazon Pharmacy and One Medical • Research, design, and implement state-of-the-art machine learning models, with a focus on Large Language Models (LLMs) and Visual Language Models (VLMs) • Optimize and fine-tune models for production deployment, including model distillation for improved latency • Drive scientific innovation while maintaining a strong focus on practical business outcomes • Collaborate with cross-functional teams to translate complex technical solutions into tangible customer benefits • Contribute to the broader Amazon Health scientific community and help shape our technical roadmap
US, MA, Boston
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As part of the AGI team, an Applied Scientist will collaborate closely with core scientist team developing Amazon Nova models. They will lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows. This includes designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks. The Applied Scientist will perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities. A critical aspect of this role involves developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment. The Applied Scientist will also set up the configuration of data collection workflows and communicate quality feedback to stakeholders. An Applied Scientist will also have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services. A day in the life An Applied Scientist with the AGI team will support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards. Besides theoretical analysis and quality framework development, an Applied Scientist will also work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice.
US, CA, Santa Clara
Amazon Quick Suite is an enterprise AI platform that transforms how organizations work with their data and knowledge. Combining generative AI-powered search, deep research capabilities, intelligent agents and automations, and comprehensive business intelligence, Quick Suite serves tens of thousands of users. Our platform processes thousands of queries monthly, helping teams make faster, data-driven decisions while maintaining enterprise-grade security and governance. From natural language interactions with complex datasets to automated workflows and custom AI agents, Quick Suite is redefining workplace productivity at unprecedented scale. We are seeking a Data Scientist II to join our Quick Data team, focusing on evaluation and benchmarking data development for Quick Suite features, with particular emphasis on Research and other generative AI capabilities. Our mission is to engineer high-quality datasets that are essential to the success of Amazon Quick Suite. From human evaluations and Responsible AI safeguards to Retrieval-Augmented Generation and beyond, our work ensures that Generative AI is enterprise-ready, safe, and effective for users at scale. As part of our diverse team—including data scientists, engineers, language engineers, linguists, and program managers—you will collaborate closely with science, engineering, and product teams. We are driven by customer obsession and a commitment to excellence. Key job responsibilities In this role, you will leverage data-centric AI principles to assess the impact of data on model performance and the broader machine learning pipeline. You will apply Generative AI techniques to evaluate how well our data represents human language and conduct experiments to measure downstream interactions. Specific responsibilities include: * Design and develop comprehensive evaluation and benchmarking datasets for Quick Suite AI-powered features * Leverage LLMs for synthetic data corpora generation; data evaluation and quality assessment using LLM-as-a-judge settings * Create ground truth datasets with high-quality question-answer pairs across diverse domains and use cases * Lead human annotation initiatives and model evaluation audits to ensure data quality and relevance * Develop and refine annotation guidelines and quality frameworks for evaluation tasks * Conduct statistical analysis to measure model performance, identify failure patterns, and guide improvement strategies * Collaborate with ML scientists and engineers to translate evaluation insights into actionable product improvements * Build scalable data pipelines and tools to support continuous evaluation and benchmarking efforts * Contribute to Responsible AI initiatives by developing safety and fairness evaluation datasets About the team Why AWS? Amazon Web Services (AWS) is the world’s most comprehensive and broadly adopted cloud platform. We pioneered cloud computing and never stopped innovating — that’s why customers from the most successful startups to Global 500 companies trust our robust suite of products and services to power their businesses. Inclusive Team Culture Here at AWS, it’s in our nature to learn and be curious. Our employee-led affinity groups foster a culture of inclusion that empower us to be proud of our differences. Ongoing events and learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon conferences, inspire us to never stop embracing our uniqueness. Mentorship & Career Growth We’re continuously raising our performance bar as we strive to become Earth’s Best Employer. That’s why you’ll find endless knowledge-sharing, mentorship and other career-advancing resources here to help you develop into a better-rounded professional. Work/Life Balance We value work-life harmony. Achieving success at work should never come at the expense of sacrifices at home, which is why we strive for flexibility as part of our working culture. When we feel supported in the workplace and at home, there’s nothing we can’t achieve in the cloud. Hybrid Work We value innovation and recognize this sometimes requires uninterrupted time to focus on a build. We also value in-person collaboration and time spent face-to-face. Our team affords employees options to work in the office every day or in a flexible, hybrid work model near one of our U.S. Amazon offices.