PortableReasoning-Hero-16x9.png
Reasoning ability doesn't automatically carry across modalities. A model that solves equations fluently in text can falter when the same problem is embedded in an interactive interface. Bridging that gap requires training that explicitly reinforces reasoning under interaction.

Portable reasoning: Releasing text-bound intelligence into agentic interaction

Large language models today can solve algebra, pass academic benchmarks, and generate highly structured chain-of-thought explanations. In text-only settings, they often feel startlingly intelligent — methodical, articulate, even strategic. But place those models inside an interactive environment — ask them to click buttons, scroll pages, fill out forms, and submit answers — and their behavior changes. Their careful reasoning falters. They guess where they once deduced. They adhere to templates and produce limited procedural narration: stating what they see and what they will click next, without first forming a structured plan and acting in accordance with plan. It’s as if part of their intelligence has quietly gone offline the moment the cursor appears.

This discrepancy reveals a critical limitation: Reasoning ability doesn’t seamlessly carry across modalities. A model that can reason effectively when a question is presented as plain text does not necessarily reason as effectively when the same question is embedded inside an interactive interface. When the problem appears in a text-only prompt, the model’s objective is clear: interpret the question, deliberate, and produce an answer. But when the identical question is rendered inside a webpage — with visual layout, HTML structure, input fields, and the requirement to click or type — the cognitive demands change. The model must parse the UI, decide how to act within it, and manage state transitions, all while preserving the underlying reasoning process. In practice, this shift in modality often disrupts reasoning. Bridging this gap requires more than additional demonstrations. It requires training that explicitly reinforces reasoning under interaction. This is where reasoning reinforcement learning (Reasoning RL) becomes essential.

Reasoning is the stability layer beneath agentic behavior

Reasoning is not an optional enhancement layered onto language models; it is the core capability that enables planning, adaptation, and generalization. Strong reasoning underpins the ability to decompose complex goals into manageable steps, recover from mistakes, adapt to changing interface states, and handle tasks that deviate from familiar templates. Without it, models tend to overfit to narrow benchmarks and surface patterns.

Web environments are interactive, stateful, and often require exploration beyond familiar patterns. Every click reshapes the interface. Every new page updates the underlying state. A misread button or overlooked field can quietly snowball into total task failure. If we expect agents to handle real-world workflows reliably — booking travel, managing dashboards, navigating enterprise tools — their reasoning can’t just survive these dynamics; it has to remain stable and even sharpen under interactive pressure.

Same model, same question — when intelligence fails to transfer

During large-scale pretraining and continued fine-tuning, models are exposed to canonical academic datasets such as GSM8K, MMLU, MMMU, and ChartQA. These datasets equip models with substantial world knowledge and reasoning skills. In text-only evaluations, the results are strong. Present a mathematical equation as plain text, and the model produces a coherent chain-of-thought and a correct solution.

PortableReasoning-Spot02-1x1.png
In interactive environments, reasoning that was stable in text becomes precarious. The model still has the knowledge — but the shift in modality disrupts its ability to deploy it.

However, when the identical problem is embedded inside an interactive webpage — rendered in HTML, requiring the model read the question from the page, click into an input box and type the answer — performance drops sharply. Instead of solving the equation, the agent often generates procedural narration: it acknowledges the presence of an input field and declares its intention to type the answer, but omits meaningful symbolic reasoning.

The knowledge is still encoded in the model’s weights. The reasoning patterns were learned during pretraining, but in agent mode, they fail to activate properly. This isn’t a loss of intelligence, but a failure of transfer across modalities. The shift from text-only prompts to interactive, multimodal environments disrupts the deployment of reasoning capabilities. Understanding and addressing this gap became a focus of our experiments within Amazon AGI.

Diagnosing the modality gap with interactive benchmarks

To systematically investigate the problem, we built agentic versions of academic benchmarks—interactive “reasoning gyms” that wrap canonical datasets inside controlled environments. These gyms preserve the core intellectual challenge of the task while introducing interaction. Questions are rendered in webpages. Answers must be submitted through input fields. The model operates in agent mode rather than pure text generation.

math-gym-interface-v1.png.001.png
Sample task displayed in the math gym interface.

We built environments for multiple classic benchmarks, and each benchmark was evaluated in multiple configurations: traditional text-only zero-shot prompts, few-shot prompts where applicable, fully rendered gym environments, and variants where the question text was explicitly included in the prompt to isolate potential OCR issues.

The results clearly exposed the modality gap. On MATH, for instance, text-only zero-shot performance substantially exceeded gym performance. On GSM8K, text-only accuracy was high, but performance in the interactive environment collapsed. The model could solve the problems in principle; it simply struggled to reason when required to act within a webpage. This gap suggested that supervised fine-tuning alone was insufficient. We needed a training signal that directly reinforced reasoning under interaction.

Mathematics as a launchpad for agentic intelligence

We began by applying reinforcement learning directly to the training split of MATH gym (agentified questions from the MATH dataset). Instead of teacher-forcing step-by-step solutions, we required the model to generate fully on-policy rollouts — reasoning, acting, observing, and adapting within the live interface. Rewards were issued only when a trajectory ended in a correct submission.

The model didn't just get better at math — it became better at reasoning in interactive settings.

The early results were encouraging. After roughly one epoch over just a few thousand questions, the gap between text-only evaluation and gym evaluation shrank drastically. The model learned to parse rendered equations, carry out symbolic reasoning while navigating the page, and type correct answers into the input box. The same reasoning that previously faltered under interaction began to hold steady across perception, action, and state updates.

More surprisingly, the gains were not confined to mathematics. Although the reinforcement learning tasks focused solely on mathematics, improvements carried over to entirely different domains. Performance also improved on agentic MMLU tasks, which assess high school- to college-level world knowledge and reasoning across subjects such as history, economics, law, medicine, biology, physics, and other academic disciplines.

This suggests the improvement wasn’t just domain-specific memorization or narrow skill tuning. The model didn’t just get better at math — it became better at reasoning in interactive settings. By learning to think, act, and adapt coherently within a structured environment, it developed skills that transfer to other tasks requiring state tracking, careful reading, and deliberate decision-making. Sharpening the model on MATH produced meaningful spillover effects, strengthening its agentic abilities across a broad range of domains.

Beyond math: A reasoning curriculum

Training web agents ultimately requires reinforcement learning on full, end-to-end interactive workflows. In these settings, the model is expected to navigate real webpages, fill out forms, apply filters, and scroll through dynamically loaded content. Encouraged by early experiment results and aiming to further advance the model’s agentic intelligence, we decided to introduce a dedicated reasoning RL phase prior to the web-task reinforcement learning stage. In this phase, rather than optimizing over step-level instruction execution, we confined training to domains with precise, automatically verifiable answers. This allowed us to shape the model’s internal reasoning process — problem decomposition, intermediate deduction, and self-verification — without the confounding noise of complex UI interaction. By strengthening this cognitive substrate in isolation, we ensured that subsequent web-task RL would build on a more deliberate and structured reasoning policy.

PortableReasoning-Spot01-1x1.png
After reasoning RL training, the same capabilities stabilize. The model learns not just to act but to reason while acting — and that stability transfers across domains.

We expanded the training curriculum to include multiple reasoning domains with reliable verification. In mathematics, we expanded to increasingly difficult competition problems from AMC and AIME to encourage deeper logical deduction and structured skill progression. We added coding tasks from MBPP (training split) to develop procedural and algorithmic reasoning. We also incorporated structured information understanding tasks—including ChartQA, WikiTable extraction, and scientific question answering — that require interpreting tabular and visual inputs. These environments strengthen the model’s capacity for grounded quantitative reasoning, such as extracting key values, comparing magnitudes, and inferring patterns from structured data. They encourage robust grounding by forcing the model to tie its answers directly to structured evidence.

All tasks share a key advantage: clear ground truth and dependable reward signals. Because the correct outcomes are easy to verify and inexpensive to scale, they provide an efficient source of tasks for training and systematic evaluations.

As training progressed, we observed qualitative changes. The model began generating longer, more detailed chains-of-thought. It became more willing to backtrack when intermediate deductions failed. It exhibited stronger schema understanding when parsing tables and improved quantitative interpretation of charts. Importantly, these improvements generalized across domains.

Stability and on-policy training

Maintaining stability during RL was essential. We relied on on-policy training to ensure that reasoning traces reflected the model’s own internal state rather than teacher-forced guidance. At the same time, mechanisms such as KL regularization helped prevent reward collapse and excessive invalid actions. Preserving sufficient entropy during Stage 1 training was critical to maintaining exploration capacity for subsequent large-scale web RL. The outcome was a base policy that not only strengthened core agentic intelligence, but also consistently outperformed pure supervised fine-tuning across real-world web workflows.

Structured reasoning in the wild

The benefits of reasoning-focused reinforcement learning extend beyond controlled base intelligence gym environments and become especially evident in real-world web workflows. Consider a multi-step task that involves searching, selecting dates, navigating listings, scrolling, and inspecting a detailed amenities section. In such scenarios, a baseline agent trained directly with RL on web workflow tasks often overfits to superficial chain-of-thought templates rather than developing robust reasoning capabilities. As a result, it would misinterpret the task requirements, prematurely conclude that it has completed the task, or return information without verifying the current page state.

In contrast, an agent trained with reasoning RL demonstrates more deliberate and state-aware behavior. It checks for and handles pop-up windows, reflects on the outcomes of its actions, and explicitly inspects relevant sections of the page before proceeding. Rather than following memorized navigation patterns, it interprets the task requirements and validates that the necessary conditions are met before returning an answer. The key difference is the emergence of structured, context-sensitive reasoning grounded in the current state of the environment.

This contrast becomes even clearer in tasks that require more precise interpretation of page content. For example, in a workflow that involved counting reviews containing a specific term, the baseline agent again exhibited brittle behavior: it scrolled aimlessly, failed to isolate the relevant information, and ultimately terminated with an error. In contrast, the agent trained with reasoning-focused RL approached the task methodically. It recognized when critical information was not immediately visible, navigated deliberately to the appropriate sections, and refined its search. Rather than executing arbitrary action sequences, it formed and tested hypotheses, using intermediate observations to guide subsequent steps. This pattern of deliberate exploration and verification further illustrates how reasoning RL promotes coherent, state-aware problem solving rather than superficial pattern matching.

Reasoning as a reinforced habit

During large-scale pretraining, models internalize latent reasoning patterns across text, code, and instruction data. However, downstream fine-tuning for agent efficiency can suppress these patterns, encouraging shorter, more procedural outputs. Reasoning RL reactivates and amplifies these latent capabilities by rewarding structured, goal-directed reasoning under interaction.

The training loop repeatedly reinforces a pattern: observe the environment, think through its implications, execute a targeted action, verify the result, and adjust if necessary. Over time, this loop becomes internalized. The model stops treating webpages as scripts to execute and begins treating them as environments to reason within.

Reliable agents start with portable reasoning

The key lesson is that modality alignment depends on deliberately strengthening reasoning capabilities. Before scaling complex web RL on open-ended tasks, it is beneficial to first reinforce the model’s reasoning substrate in controlled, verifiable domains. A structured curriculum of reasoning gyms enables reliable transfer across modalities, restores suppressed capabilities, and promotes cross-domain generalization — ultimately producing a more stable and intelligent base policy for subsequent web-scale training.

Reliable agents do not emerge solely from larger models or more trajectories. They emerge when reasoning is deliberately strengthened as a fundamental skill for web interactions. If we want agents that can plan, recover, adapt, and generalize in real-world workflows, we must train them not just to act—but to reason while acting. Reasoning RL is not an auxiliary optimization: it is a foundational step toward agentic intelligence that is coherent, transferable, and robust.

Meiqi Sun joins cognitive scientist Dr. Danielle Perszyk to discuss the shift from simple action execution to high-reasoning agents and why teaching models to reason requires letting them struggle.

Research areas
  • Machine learning

Related content

US, CA, Sunnyvale
The Artificial General Intelligence (AGI) team is looking for a passionate, talented, and inventive Member of Technical Staff with a strong deep learning background, to build industry-leading Generative Artificial Intelligence (GenAI) technology with Large Language Models (LLMs) and multimodal systems. Key job responsibilities As a Member of Technical Staff with the AGI team, you will lead the development of algorithms and modeling techniques, to advance the state of the art with LLMs. You will lead the foundational model development in an applied research role, including model training, dataset design, and pre- and post-training optimization. Your work will directly impact our customers in the form of products and services that make use of GenAI technology. You will leverage Amazon’s heterogeneous data sources and large-scale computing resources to accelerate advances in LLMs. About the team The AGI team has a mission to push the envelope in GenAI with LLMs and multimodal systems, in order to provide the best-possible experience for our customers.
GB, London
We are looking for an Economist to work on exciting and challenging business problems related to Amazon Retail’s worldwide product assortment. You will build innovative solutions based on econometrics, machine learning, and experimentation. You will be part of a interdisciplinary team of economists, product managers, engineers, and scientists, and your work will influence finance and business decisions affecting Amazon’s vast product assortment globally. If you have an entrepreneurial spirit, you know how to deliver results fast, and you have a deeply quantitative, highly innovative approach to solving problems, and long for the opportunity to build pioneering solutions to challenging problems, we want to talk to you. Key job responsibilities * Work on a challenging problem that has the potential to significantly impact Amazon’s business position * Develop econometric models and experiments to measure the customer and financial impact of Amazon’s product assortment * Collaborate with other scientists at Amazon to deliver measurable progress and change * Influence business leaders based on empirical findings
US, NY, New York
Amazon Advertising is one of Amazon's fastest growing and most profitable businesses. Our products are used daily to surface new selection and provide customers a wider set of product choices along their shopping journeys. The business is focused on generating value for shoppers as well as advertisers. Our team uses a combination of econometrics, machine learning, and data science to build disruptive products for all our Advertising products. We also generate insights to guide Amazon Advertising strategy, providing direct support to senior leadership. We are looking for an experienced Economist with a deep passion for building econometric solutions and the ability to communicate data insights and scientific vision to execute on strategic projects. Key job responsibilities - Leverage econometrics and ML models to optimize advertising strategies on behalf of our customers. - Influence key business and product decisions based on insights from models you develop. - Perform hands-on analysis and modeling with enormous data sets to develop insights that increase traffic monetization and merchandise sales without compromising shopper experience. - Work closely with software engineers on detailed requirements to productionize the models you build. - Run A/B experiments that affect hundreds of millions of customers, evaluate the impact of your optimizations and communicate your results to various business stakeholders. - Work with other scientists, software developers, and product partners to implement your solutions.
US, NY, New York
About Sponsored Products and Brands: The Sponsored Products and Brands (SPB) organization at Amazon Ads is re-imagining the advertising landscape through industry leading generative AI technologies, revolutionizing how millions of customers discover products and engage with brands across Amazon.com and beyond. We are at the forefront of re-inventing advertising experiences, bridging human creativity with artificial intelligence to transform every aspect of the advertising lifecycle from ad creation and optimization to performance analysis and customer insights. We are a passionate group of innovators dedicated to developing responsible and intelligent AI technologies that balance the needs of advertisers, enhance the shopping experience, and strengthen the marketplace. If you're energized by solving complex challenges and pushing the boundaries of what's possible with AI, join us in shaping the future of advertising. About Our Team: The Brand Beacon team is responsible for inventing impressions offerings for brands to increase share of voice via premium experiences, helping brands get discovered, acquire new customers and sustainably grow customer lifetime value. We build end-to-end solutions that enable brands to drive discovery, visibility and share of voice. This includes building advertiser controls, shopper experiences, monetization strategies and optimization features. We succeed when (1) shoppers discover, engage and build affinity with brands and (2) brands can grow their business at scale with our advertising products. About This Role: As a Senior Scientist for the team, you will have the opportunity to apply your deep subject matter expertise in the area of ML, LLM and GenAI models. You will invent new product experiences that enable novel advertiser and shopper experiences. This role will liaise with internal Amazon partners and work on bringing state-of-the-art GenAI models to production, and stay abreast of the latest developments in the space of GenAI and identify opportunities to improve the efficiency and productivity of the team. Additionally, you will define a long-term science vision for our advertising business, driven by our customer’s needs, and translate it into actionable plans for our team of applied scientists and engineers. This role will play a critical role in elevating the team’s scientific and technical rigor, identifying and implementing best-in-class algorithms, methodologies, and infrastructure that enable rapid experimentation and scaling. You will communicate learnings to leadership and mentor and grow Applied AI talent across org. * Develop AI solutions for advertiser and shopper experiences. Build monetization and optimization systems that leverage generative models to value and improve campaign performance. * Define a long-term science vision and roadmap for our advertising business, driven from our customers' needs, translating that direction into specific plans for applied scientists and engineering teams. This role combines science leadership, organizational ability, technical strength, product focus, and business understanding. * Design and conduct A/B experiments to evaluate proposed solutions based on in-depth data analyses. * Effectively communicate technical and non-technical ideas with teammates and stakeholders. * Stay up-to-date with advancements and the latest modeling techniques in the field. * Think big about the arc of development of Gen AI over a multi-year horizon and identify new opportunities to apply these technologies to solve real-world problems. #GenAI
US, WA, Seattle
Amazon Industrial Robotics is seeking exceptional applied science talent to develop AI and machine learning systems that will enable the next generation of advanced manufacturing capabilities at unprecedented scale. We're building revolutionary software infrastructure that combines cutting-edge AI, large-scale optimization, and advanced manufacturing processes to create adaptive production control systems. As a Senior Applied Scientist, you will develop and improve machine learning systems that enable real-time manufacturing flow decisions. You will leverage state-of-the-art optimization and ML techniques, evaluate them against representative manufacturing scenarios, and adapt them to meet the robustness, reliability, and performance needs of production environments. You will invent new algorithms where gaps exist. You'll collaborate closely with software engineering, manufacturing engineering, robotics simulation, and operations teams, and your outputs will directly power the systems that determine what to build next, where to allocate resources, and how to maximize throughput. The ideal candidate brings deep expertise in optimization and machine learning, with a proven track record of delivering scientifically complex solutions into production. You are hands-on, writing significant portions of critical-path scientific code while driving your team's scientific agenda. If you're passionate about inventing the intelligent manufacturing systems of tomorrow rather than optimizing those of today, this role offers the chance to make a lasting impact on the future of automation. Key job responsibilities - Identify and devise new scientific approaches for constraint identification, dispatch optimization, WIP release control, and predictive flow intelligence when the problem is ill-defined and new methodologies need to be invented - Lead the design, implementation, and successful delivery of scientifically complex solutions for real-time manufacturing flow optimization in production - Design and build ML models and optimization algorithms including constraint prediction, starvation risk forecasting, and dispatch optimization - Write a significant portion of critical-path scientific code with solutions that are inventive, maintainable, scalable, and extensible - Execute rapid, rigorous experimentation with reproducible results, closing the gap between simulation and real manufacturing environments - Build evaluation benchmarks that measure model performance against manufacturing outcomes including constraint utilization and throughput rather than traditional ML metrics alone - Influence your team's science and business strategy through insightful contributions to roadmaps, goals, and priorities - Partner with manufacturing engineering, robotics simulation, and applied intelligence teams to ensure scientific approaches are grounded in operational reality - Drive your team's scientific agenda and role model publishing of research results at peer-reviewed venues when appropriate and not precluded by business considerations - Actively participate in hiring and mentor other scientists, improving their skills and ability to deliver - Write clear narratives and documentation describing scientific solutions and design choices
US, NY, New York
The Ads Measurement Science team in the Measurement, Ad Tech, and Data Science (MADS) team of Amazon Ads serves a centralized role developing solutions for a multitude of performance measurement products. We create solutions which measure the comprehensive impact of advertiser's ad spend, including sales impacts both online and offline and across timescales, and provide actionable insights that enable our advertisers to optimize their media portfolios. We also own the science solutions for AI tools that unlock new insights and automate high-effort customer workflows, such as custom query and report generation based on natural language user requests. We leverage a host of scientific technologies to accomplish this mission, including Generative AI, classical ML, Causal Inference, Natural Language Processing, and Computer Vision. As a Senior Research Scientist on the team, you will be at the forefront of innovation, developing measurement solutions end-to-end from inception to production. You will set the technical vision and innovate on behalf of our customers. You will propose, design, analyze, and productionize models to provide novel measurement insights to our customers. You will partner with engineering to deploy these solutions into production. You will work with key stakeholders from various business teams to enable advertisers to act upon those metrics. Key job responsibilities * Lead the development of ad measurement models and solutions that address the full spectrum of an advertiser's investment, focusing on scalable and efficient methodologies. * Collaborate closely with cross-functional teams including engineering, product management, and business teams to define and implement measurement solutions. * Use state-of-the-art scientific technologies including Generative AI, Classical Machine Learning, Causal Inference, Natural Language Processing, and Computer Vision to develop state of the art models that measure the impact of ad spend across multiple platforms and timescales. * Drive experimentation and the continuous improvement of ML models through iterative development, testing, and optimization. * Translate complex scientific challenges into clear and impactful solutions for business stakeholders. * Mentor and guide junior scientists, fostering a collaborative and high-performing team culture. * Foster collaborations between scientists to move faster, with broader impact. * Regularly engage with the broader scientific community with presentations, publications, and patents. A day in the life You will solve real-world problems by getting and analyzing large amounts of data, generate business insights and opportunities, design simulations and experiments, and develop statistical and ML models. The team is driven by business needs, which requires collaboration with other Scientists, Engineers, and Product Managers across the advertising organization. You will prepare written and verbal presentations to share insights to audiences of varying levels of technical sophistication. Team video https://advertising.amazon.com/help/G4LNN5YWHP6SM9TJ About the team We are a team of scientists across Applied, Research, Data Science and Economist disciplines. You will work with colleagues with deep expertise in ML, NLP, CV, Gen AI, and Causal Inference with a diverse range of backgrounds. We partner closely with top-notch engineers, product managers, sales leaders, and other scientists with expertise in the ads industry and on building scalable modeling and software solutions.
US, WA, Seattle
Amazon's Stores-Ads Science team operates at the intersection of Amazon's Stores and advertising businesses. We develop causal measurement systems, optimization algorithms, and machine learning models that inform how advertising affects shopper engagement, driving selling partner growth and marketplace economics. Our science shapes decisions both at the strategic level and in production systems. We are a team of interdisciplinary scientists who combine causal inference, economic modeling, and machine learning to drive measurable business impact. We are looking for an Applied Science Manager to lead our Ads Impact initiative. This team owns the science of understanding and optimizing how advertising creates value for shoppers and selling partners. What makes this role distinctive is its position at the frontier of AI and Economics: as Amazon's shopping experience evolves from traditional search toward LLM-powered, agentic commerce, the fundamental mechanisms through which advertising creates value are changing. This role will partner with leading scientists and academic researchers to measure these effects through large-scale causal experimentation, and develop novel methods to encode causal and economic reasoning into AI systems that optimize the shopping experience. Key job responsibilities In this role, you will lead a team of scientists, setting the technical vision and science roadmap for ads impact measurement and optimization. You will design experiments that identify the causal mechanisms through which advertising drives shopper engagement, advertiser value, and marketplace outcomes. You will develop optimization algorithms that integrate these causal signals into production and business decision-making, in close partnership with engineering and product teams across the organization. You will lead the research and communicate findings and recommendations to senior leadership through written narratives that connect technical science to business strategy. This role requires deep expertise in causal inference and experimental design, combined with strong applied ML skills and the engineering judgment to translate research into production systems. You will hire and develop future science leaders, think strategically, set ambitious roadmaps in highly ambiguous problem spaces, and foster a culture that values both intellectual depth and production impact. You will work cross-functionally, influencing across organizational boundaries to drive alignment on complex, multi-sided tradeoffs.
US, WA, Seattle
Do you want a role with deep meaning and the ability to make a major impact? As part of Intelligent Talent Acquisition (ITA), you'll have the opportunity to reinvent the hiring process and deliver unprecedented scale, sophistication, and accuracy for Amazon Talent Acquisition operations. ITA is an industry-leading people science and technology organization made up of scientists, engineers, analysts, product professionals and more, all with the shared goal of connecting the right people to the right jobs in a way that is fair and precise. Last year we delivered over 6 million online candidate assessments, and helped Amazon deliver billions of packages around the world by making it possible to hire hundreds of thousands of workers in the right quantity, at the right location and at exactly the right time. You’ll work on state-of-the-art research, advanced software tools, new AI systems, and machine learning algorithms, leveraging Amazon's in-house tech stack to bring innovative solutions to life. Join ITA in using technologies to transform the hiring landscape and make a meaningful difference in people's lives. Together, we can solve the world's toughest hiring problems. Recruiting Agents and Candidate Voice team is revolutionizing how Amazon finds and connects with talent worldwide! We're looking for an experienced Applied Scientist to design and implement agentic solutions that help millions of candidates find their dream jobs at Amazon. Key job responsibilities • Design and architect AI-powered agentic solutions that help candidates navigate Amazon's hiring process, including scoping requirements, identifying dependencies and constraints, and creating robust scientific and technical designs that balance candidate experience with system scalability. • Implement and deploy conversational AI agents leveraging state-of-the-art LLM and GenAI technologies to enable candidates to explore job opportunities, understand role requirements, and receive personalized guidance throughout their hiring journey. • Develop rigorous evaluation frameworks to measure agent effectiveness, candidate satisfaction, and hiring outcomes—continuously iterating on models to improve accuracy, fairness, and user experience across millions of candidate interactions. • Collaborate cross-functionally with Research Scientists, Software Engineers, and Product teams to integrate agentic solutions into Amazon's candidate-facing platforms, ensuring seamless deployment and alignment with broader Talent Acquisition goals. • Drive innovation in agentic AI research by staying current with advances in NLP, LLMs, and autonomous agent architectures, while contributing to the scientific community through publications, internal tech talks, and knowledge sharing. About the team Our team focuses on understanding and improving the experience of both job seekers and the recruiters who support them. You'll be at the intersection of people, data, and technology—solving fascinating problems that directly impact how we hire the best talent globally.
GB, London
Sr. Applied Scientists in AWS Automated Reasoning are dedicated to making AWS the best computing service in the world for customers who require advanced and rigorous solutions for automated reasoning, privacy, and sovereignty. Key job responsibilities The successful candidate will: Solve large or significantly complex problems that require deep knowledge and understanding of your domain and scientific innovation. Own strategic problem solving, and take the lead on the design, implementation, and delivery for solutions that have a long-term quantifiable impact. Provide cross-organizational technical influence, increasing productivity and effectiveness by sharing your deep knowledge and experience. Develop strategic plans to identify fundamentally new solutions for business problems. Assist in the career development of others, actively mentoring individuals and the community on advanced technical issues.
US, WA, Seattle
Do you want a role with deep meaning and the ability to make a major impact? As part of Intelligent Talent Acquisition (ITA), you'll have the opportunity to reinvent the hiring process and deliver unprecedented scale, sophistication, and accuracy for Amazon Talent Acquisition operations. ITA is an industry-leading people science and technology organization made up of scientists, engineers, analysts, product professionals and more, all with the shared goal of connecting the right people to the right jobs in a way that is fair and precise. Last year we delivered over 6 million online candidate assessments, and helped Amazon deliver billions of packages around the world by making it possible to hire hundreds of thousands of workers in the right quantity, at the right location and at exactly the right time. You’ll work on state-of-the-art research, advanced software tools, new AI systems, and machine learning algorithms, leveraging Amazon's in-house tech stack to bring innovative solutions to life. Join ITA in using technologies to transform the hiring landscape and make a meaningful difference in people's lives. Together, we can solve the world's toughest hiring problems. Recruiting Agents and Candidate Voice team is revolutionizing how Amazon finds and connects with talent worldwide! We're looking for an experienced Applied Scientist to design and implement agentic solutions that help millions of candidates find their dream jobs at Amazon. Key job responsibilities • Design and architect AI-powered agentic solutions that help candidates navigate Amazon's hiring process, including scoping requirements, identifying dependencies and constraints, and creating robust scientific and technical designs that balance candidate experience with system scalability. • Implement and deploy conversational AI agents leveraging state-of-the-art LLM and GenAI technologies to enable candidates to explore job opportunities, understand role requirements, and receive personalized guidance throughout their hiring journey. • Develop rigorous evaluation frameworks to measure agent effectiveness, candidate satisfaction, and hiring outcomes—continuously iterating on models to improve accuracy, fairness, and user experience across millions of candidate interactions. • Collaborate cross-functionally with Research Scientists, Software Engineers, and Product teams to integrate agentic solutions into Amazon's candidate-facing platforms, ensuring seamless deployment and alignment with broader Talent Acquisition goals. • Drive innovation in agentic AI research by staying current with advances in NLP, LLMs, and autonomous agent architectures, while contributing to the scientific community through publications, internal tech talks, and knowledge sharing. About the team Our team focuses on understanding and improving the experience of both job seekers and the recruiters who support them. You'll be at the intersection of people, data, and technology—solving fascinating problems that directly impact how we hire the best talent globally.