Optimizing LoRA target module selection for efficient fine tuning

Ablation study clarifies trade-offs between accuracy and efficiency when using low-rank adaptation (LoRA) to fine-tune AI models.

Overview by Amazon Nova
  • On the CoCoHD dataset, using o_proj + fc2 achieved a +15% absolute improvement over the base model, compared to only +3% with o_proj alone, demonstrating that task difficulty amplifies the impact of target module selection ("Optimizing LoRA target module selection for efficient fine tuning," Amazon Science, 2026).
  • The o_proj-only configuration demonstrated remarkable consistency, never failing outright on any task and typically performing within a few percentage points of the best configuration, making it an attractive default choice for the Nova 2.0 Lite multimodal reasoning LLM (Ibid.).
  • On average, o_proj LoRA is within 2% of o_proj + fc2 in terms of accuracy but has 22.6% lower latency (TPOT p95 decreases from 10.085ms → 7.803ms), highlighting the efficiency benefits of using o_proj alone (Ibid.).
Was this answer helpful?

Fine-tuning a large language model (LLM) on a specific task requires updates to billions of parameters across trillions of tokens, with the attendant costs in GPU resources and time.

Low-rank adaptation (LoRA) is a more efficient alternative that freezes the original model weights but introduces lightweight matrices into specific model sublayers, or “modules”. These matrices (commonly referred to as “adapters”) modify the modules’ weights, enabling not only efficient fine tuning but also on-demand model serving, which dramatically lowers inference costs; base-model sharing across GPUs, which cuts memory requirements; lower download overhead; and parallel inference across multiple adapters.

Related content
New service lets customers mix their own data with the data used to train Amazon Nova at each major stage of model development, enabling deep domain understanding while preventing "catastrophic forgetting".

The question is where to insert these adapters across the model. Empirically, targeting more and larger modules tends to boost performance, because it allows more flexibility in customization; but it also increases training and inference costs. Using a smaller, well-chosen subset preserves most gains with significantly better efficiency.

Using Amazon’s Nova 2.0 Lite multimodal reasoning LLM as our base model, we set ourselves the goal of identifying a subset of standardized target-module configurations that works effectively across the vast majority of customer use cases. Through an ablation study, we identified a module known as o_proj, as the single module where adding an adapter achieves the best trade-off between efficiency and accuracy (o_proj is a linear transformation that mixes representations across attention heads into a single, cohesive form for the rest of the model to understand).

The Transformer architecture

Transformer models — the models responsible for all of AI’s remarkable recent gains — consist largely of blocks that are repeated multiple times. Each block in turn has two main components: an attention mechanism, which determines the relevance of previously seen tokens to the token currently being processed, and a feed-forward network, a conventional neural network that does additional processing on the outputs of the attention mechanism.

Related content
A new hybrid optimization approach allows edge devices to fine-tune vision-language models using only forward passes, achieving up to 7% higher accuracy than existing techniques.

The attention mechanism involves three different matrices, which take their names from database design: the query matrix represents how relevant the current token is to the other tokens in the input sequence; the key matrix represents how relevant other tokens are to one another; and the value matrix represents the raw content of those other tokens. Multiplying the three matrices together creates, essentially, a recipe for the Transformer's next output.

To reduce computational complexity, these multiplications take place in a space with reduced dimensions. The matrices themselves and the results of their multiplication then have to be projected back up to the original dimensions of the input.

LoRA approximates weight updates using a product of two smaller matrices, drastically reducing the number of trainable parameters. The technique is typically applied to attention projection layers and feed-forward network layers. These modules are ideal candidates because they constitute the bulk of Transformer parameters, directly govern representation learning, and exhibit natural alignment with low-rank approximations. Empirical evidence shows weight changes in these layers often lie within a low-dimensional subspace during fine tuning.

LoRA.16x9.png
LoRA for a generic layer-weight matrix (W). The weights are modified by the product of two smaller matrices (A and B), whose lower dimensions drastically reduce the number of trainable parameters.

Target module selection

Selecting the right target modules directly affects accuracy, latency, and computational efficiency. The optimal choice of target modules is primarily a function of (a) the base model being fine-tuned (i.e., its architecture, pre- and post-training data distributions, etc.) and (b) customization domain/modality.

When fine-tuning Nova 2.0 Lite, we balanced two competing objectives:

  1. Maximizing accuracy across diverse tasks and modalities and
  2. Minimizing latency to preserve LoRA's efficiency benefits.

We investigated the application of LoRA to four different modules in each Transformer block: the query, key, and value projection layers ( qkv); the o_proj layer; and two different fully connected layers in the feed-forward network, gate_up_proj and gate_down_proj (referred to as fc1 and fc2). Below are the trade-offs for these modules, both singly and in combination, based on results published in literature and empirical studies.

Combination

Expected accuracy

Expected latency

Use case

qkv only

Good (baseline)

Lowest

  • Resource-constrained environments
  • Tasks where attention mechanisms are critical (e.g., classification, lightweight generation)
  • Prioritizes speed over maximum accuracy

o_proj only

Moderate

Lowest

  • Ultralow-latency scenarios
  • Tasks where refining attention outputs is sufficient (e.g., simple sentiment analysis). Plays an important role in reasoning
  • Less effective than qkv, but very efficient

qkv + o_proj

High

Low to moderate (+5–10%)

  • Attention-focused tasks (e.g., machine translation, summarization)
  • Balances refinement of both attention context ( o_proj) and query/key/value projections ( qkv)
  • Best accuracy-to-latency ratio for most NLP tasks

qkv + fc1 / fc2

Very high (close to full fine tuning)

Moderate (+10–15%)

  • Complex generation tasks (e.g., translation, long-form summarization)
  • When feed-forward layers ( fc1/ fc2) significantly influence output quality as they store and retrieve factual knowledge
  • Prioritizes accuracy over speed

o_proj + fc1 / fc2

Good to high

Moderate (+5–10%)

  • Tasks requiring adaptation of both attention output ( o_proj) and feed-forward layers (e.g., text classification, sentiment analysis)
  • Suitable when qkv adaptation is unnecessary

qkv + o_proj + fc1 / fc2

Highest (near-full fine tuning)

High (+15–20%)

  • Maximum accuracy for critical tasks (e.g., research benchmarks, high-stakes generation)
  • When all components of the Transformer block need adaptation
  • Avoid for production if latency matters

All modules
( qkv, o_proj, fc1, fc2)

Maximum

Highest (+20–25%)

  • Prototyping/research with no latency constraints
  • Rarely justified in practice; marginal gains over qkv + o_proj + fc1/ fc2

Trade-offs of accuracy and latency across target modules, based on literature review and empirical evidence.

Experimental methodology

We conducted a comprehensive ablation study, training multiple supervised-fine-tuning (SFT) LoRA variants on seven datasets spanning both text and visual data, across reasoning (i.e., the training datasets themselves include reasoning content) and non-reasoning tasks. The datasets covered diverse challenges from simple question answering to long-context summarization and structured JSON extraction.

Dataset

Modality

Reasoning traces

Domain

Tasks

Training size

Eval size

Eval metric

Source

FinCOT

Txt

Yes

Finance

Financial-reasoning dataset. Samples consist of complex financial queries, along with reasoning traces obtained from GPT-4o. Predictions are typically complex tables or calculations based on the input.

7436

1147

Accuracy

https://huggingface.co/datasets/TheFinAI/FinCoT

GovReport

Txt

No

Goverment Doc

Large-context (30-40K tokens) summarization

17457

837

RougeLsum

https://gov-report-data.github.io/

MedMCQA

Txt

No

Medical

Dataset for multiple-choice QA — also used in Nova 1.0

20k

3683

Accuracy

https://huggingface.co/datasets/openlifescienceai/medmcqa

MedReason

Txt

Yes

Medical

Medical-reasoning dataset that consists of questions and answers compiled from various medical benchmarks (MedQA, MedMCQA, etc.), along with synthetic, high-quality reasoning traces. (This uses the same eval set as MedMCQA.)

31682

3683

Accuracy

https://huggingface.co/datasets/UCSC-VLAA/MedReason

CoCoHD

Txt

No

Political Doc

A complex benchmark consisting of large-context (>20K tokens) transcripts of congressional hearings. The output is expected to be a summary in a specific JSON format, consisting of the members present, topic discussed, outcomes, etc.

732

1053

Averaged key and value match rate

https://github.com/gtfintechlab/CoCoHD

Llava-COT

Image

Yes

Image understanding, General/Science

Multimodal, image benchmark consisting of Q&A reasoning questions. The dataset includes high-quality reasoning traces.

10k

270

Exact match rate

https://huggingface.co/datasets/Xkev/LLaVA-CoT-100k

Invoice OCR

Image

No

Image understanding

OCR benchmark that takes an input image and produces a JSON file with fields from the image.

1400

447

Accuracy

Summary of the experiment datasets

All experiments used the Nova 2.0 Lite general-availability checkpoint with consistent hyperparameters across target modules, including learning-rate ratio and alpha values.

Target dataset

Setting

SFT LoRA target performance

Nova 2.0 Lite performance

Fin-COT

qkv

67.09%

72.12%

o_proj

68.30%

fc1

75.35%

fc2

60.24%

o_proj + fc1

61.38%

qkv + fc2

60.31%

o_proj + fc2

62.79%

qkv + fc1

68.37%

All target modules

66.15%

CoCoHD

qkv

19.64%

45.14%

o_proj

65.88%

fc1

41.96%

fc2

17.62%

o_proj + fc1

76.83%

qkv + fc2

66.47%

o_proj + fc2

79.14%

qkv + fc1

45.45%

All target modules

82.75%

GovReport

o_proj

41.25%

38.90%

fc1

39.69%

o_proj + fc1

41.74%

o_proj + fc2

42.16%

qkv + fc1

41.66%

qkv + fc2

39.02%

All target modules

41.95%

Llava-COT

qkv

64.26%

16.22%

o_proj

64.26%

fc1

65.92%

fc2

65.02%

o_proj + fc1

63.21%

qkv + fc2

62.76%

o_proj + fc2

66.37%

qkv + fc1

66.52%

All target modules

63.96%

Invoice OCR

o_proj

89.07%

14.10%

o_proj + fc1

90.03%

qkv + fc2

87.84%

o_proj + fc2

89.47%

qkv + fc1

88.55%

All target modules

90.11%

MedReason

o_proj

24.55%

1.68%

o_proj + fc1

20.88%

qkv + fc2

8.39%

o_proj + fc2

20.36%

qkv + fc1

4.32%

All target modules

26.72%

MedMCQA

qkv

62.18%

1.68%

o_proj

63.10%

fc1

12.90%

fc2

59.98%

o_proj + fc1

61.39%

qkv + fc2

65.63%

o_proj + fc2

64.95%

qkv + fc1

57.21%

All target modules

66.11%

Ablation study for target module selection. Some benchmarks have fewer variations, to save on computation and time. MedMCQA and MedReason use the MedMCQA test set for evaluation. On this task, Nova 2.0 Lite fails mainly due to formatting inconsistencies, even though it produces the right answer. For consistency’s sake, we use the same strict parser for SFT models.

Key findings

1. O_proj is the most robust single target

The o_proj-only configuration demonstrated remarkable consistency, never failing outright on any task and typically performing within a few percentage points of the best configuration (i.e., using all target modules). On MedMCQA, CoCoHD, GovReport, LLaVA-CoT, and Invoice OCR, o_proj-only either matched or came very close to optimal performance, making it an attractive default choice that balances performance and simplicity. There is emerging evidence that this module plays a key role in reasoning, which may explain its effectiveness here.

2. Qkv-only shows instability

Related content
A new philosophy for developing LLM architectures reduces energy requirements, speeds up runtime, and preserves pretrained-model performance.

While qkv-only performed well on MedMCQA, it exhibited extreme variability, performing below baseline on CoCoHD and showing unremarkable results elsewhere. This aligns with the hypothesis that attention-only LoRA can underfit on tasks requiring richer features from the feed-forward network, rather than relying on modified token routing.

3. Module combinations provide modest gains

Combinations like o_proj + fc2 or "all target modules" often achieved the highest per-dataset scores (particularly on CoCoHD, MedReason, and Invoice OCR). However, improvements over the best single module were typically modest, usually 1-3 percentage points.

4. Task difficulty amplifies configuration impact

On challenging benchmarks where the base model performed poorly, the choice of target modules had greater impact. For example, on CoCoHD (long-context, complex JSON generation), o_proj + fc2 achieved a +15% absolute improvement over the base model, compared to only +3% with o_proj alone.

5. LoRA consistently outperforms base models

Across nearly all datasets, any reasonable LoRA configuration dramatically outperformed the base model. For instance, MedReason, MedMCQA, LLaVA-CoT, and Invoice OCR showed improvements from a baseline accuracy of ~1-16% to 60-90%+ with LoRA. The notable exception was Fin-COT, where only certain configurations (notably fc1) exceeded baseline performance, suggesting task-specific sensitivity to adaptation strategy.

Recommendations

For accuracy-prioritized scenarios, we recommend o_proj + fc2 as the optimal configuration for both text and multimodal tasks, showing 2-12% improvements over o_proj alone across benchmarks.

Related content
Language models inspired by specialized processing regions in the brain offer significant time and cost savings.

For balanced efficiency and performance, o_proj-only provides an excellent default, offering robust performance with minimal latency overhead — particularly valuable when serving multiple adapters or operating under resource constraints.

For challenging tasks, such as benchmarks with long context or complex generation requirements or other tasks where base models struggle, the additional accuracy from o_proj + fc2 justifies the modest latency increase.

Future directions

Our research opens several promising avenues for further optimization:

  1. Modality and task-specific configurations: Segmenting target module selection by modality and task difficulty (e.g., long-context scenarios) could yield specialized configurations with better accuracy-latency trade-offs.
  2. Per-module hyperparameter optimization: Extensive hyperparameter optimization for each target module configuration could unlock additional performance gains, though computational costs remain a consideration.
  3. Two-stage LoRA for early candidate identification: Leveraging two-stage LoRA approaches that use training dynamics, gradients, etc., to determine the importance of different modules/layers could help identify promising configurations early in training, reducing the cost of comprehensive hyperparameter searches.
  4. Layer pruning for latency reduction: Using two-stage training to identify and prune unused layers could further reduce inference latency while maintaining accuracy.

Conclusion

Our comprehensive study demonstrates that thoughtful target module selection in LoRA fine tuning can improve accuracy while preserving the efficiency advantages that make LoRA attractive for production deployments. The o_proj layer emerges as a remarkably robust single target, while o_proj + fc2 combinations offer the best accuracy for challenging tasks. On average, o_proj LoRA is within 2% of o_proj + fc2 in terms of accuracy but has 22.6% lower latency (TPOT p95 decreases from 10.085ms → 7.803ms). These findings provide a principled foundation for standardizing LoRA configurations across diverse customer use cases, balancing the competing demands of model performance and computational efficiency.

Acknowledgements: Kevin Rondinone, Kevin Chen, Nicole Ding, Sebastian Massella, Andy Li

Research areas

Related content

IN, KA, Bengaluru
Are you passionate about solving complex business problems at scale through Generative AI? Do you want to help build intelligent systems that reason, act, and learn from minimal supervision? If so, we have an exciting opportunity for you on Amazon's Trustworthy Shopping Experience (TSE) team. At TSE, our vision is to guarantee customers a worry-free shopping experience by earning their trust that the products they buy are safe, authentic, and compliant with regulations and policy. We do this in close partnership with our selling partners, empowering them with best-in-class tools and expertise to offer a high-quality, compliant selection that customers trust. As an Applied Scientist I, you will bring subject matter expertise in at least one relevant discipline (e.g., NLP, computer vision, representation learning, agentic architecture) to contribute to next-generation agentic AI solutions that automate complex manual investigation processes at Amazon scale. Working alongside senior scientists, you will map business goals—such as reducing cost-of-serving while maintaining trust and safety standards—to well-defined scientific problems and metrics. You will invent, refine, and experiment with solutions spanning agentic reasoning, self-supervised representation learning, few-shot adaptation, multimodal understanding, and model compression. With guidance from senior scientists, you will stay current on research trends and benchmark your results against the state of the art. You will help design and execute experiments to identify optimal solutions, initiating the development and implementation of small components with team guidance. You will write secure, stable, testable, and well-documented production code at the level of an SDE I, rigorously evaluating models and quantifying performance. You will handle data in accordance with Amazon policies, troubleshoot issues to root cause, and ensure your work does not put the company at risk. Your scope of influence will typically be at the self-level, with the possibility of mentoring interns. You will participate in team design and prioritization discussions, learn the business context behind TSE's products, and escalate problems with proposed solutions. You will publish internal technical reports and may contribute to peer-reviewed publications and external review activities when aligned with business needs. This role offers a unique opportunity to contribute to end-to-end AI development—from research through production—with your contributions serving hundreds of millions of customers within months, not years. Key job responsibilities • Contribute to the design and development of agentic AI systems with multi-step reasoning, autonomous task execution, and multimodal intelligence, including feedback and memory mechanisms, leveraging reinforcement learning techniques for agent decision-making and policy optimization, with input and guidance from senior scientists • Help productionize models built on top of SFT (Supervised Fine-tuning) and RFT (Reinforced Fine-tuning) approaches, as well as few-shot approaches based on multimodal datasets spanning text, images, and structured data, applying mathematical optimization techniques to improve efficiency, resource allocation, and decision-making in complex workflows, working alongside senior scientists to identify optimal solutions • Contribute to building production-ready deep learning and conventional ML solutions, including multimodal fusion and cross-modal alignment techniques that seamlessly connect visual, textual, and relational understanding, to support automation requirements within your team's scope • Help identify customer and business problems; use reasonable assumptions, data, and customer requirements to solve well-defined scientific problems involving multimodal inputs such as unstructured text, documents, product images, and relational data, developing representations that capture complementary signals across modalities and mapping business goals to scientific metrics • May co-author research papers for peer-reviewed internal and/or external venues, including contributions in areas such as multimodal representation learning and vision-language modeling, and contribute to the wider scientific community by reviewing research submissions, when aligned with business needs • Prototype rapidly, iterate based on feedback, and deliver small components at SDE I level—including multimodal data pipelines and inference modules—that integrate into production-scale systems • Write secure, stable, testable, maintainable, and well-documented code, balancing model capability, deployment cost, and resource usage across multimodal architectures while understanding state-of-the-art data structures, algorithms, and performance tradeoffs • Rigorously test code and evaluate models across individual and combined modalities, quantifying their performance; troubleshoot issues, research root causes, and thoroughly resolve defects, leaving systems more maintainable • Participate in team design, scoping, and prioritization discussions through clear verbal and written communication; seek to learn the business context, science, and engineering behind your team's products, including how multimodal signals contribute to trust and safety decisions • Participate in engineering best practices with peer reviews; clearly document approaches and communicate design decisions; publish internal technical reports to institutionalize scientific learning • Help train and mentor scientist interns; identify and escalate problems with proposed solutions, taking ownership or ensuring clear hand-off to the right owner About the team Trustworthy Shopping Experience Product team in TSE is responsible for the human-in-the-loop products and technology used in the risk investigations at Amazon. The team is also responsible for reducing the cost of performing the investigations, by automating wherever possible and optimizing the experience where manual interventions are needed. The team leverages state-of-the art technology and GenAI to deliver the products and associated goals.
US, NY, New York
Do you want to lead the Ads industry and redefine how we measure the effectiveness of Amazon Ads business? Are you passionate about causal inference, Deep Learning & AI, raising the science bar, and connecting leading-edge science research to Amazon-scale implementation? If so, come join Amazon Ads to be a science leader within our Advertising Incrementality Measurement science team! Our work builds the foundations for providing customer-facing advertising measurement tools, furthering internal research & development, and building out Amazon's advertising measurement offerings. Incrementality is a lynchpin for the next generation of Amazon Advertising measurement solutions, and this role will play a key role in the release and expansion of these offerings. We are looking for a thought leader that has an aptitude for delivering customer-focused solutions and who enjoys working on the intersection of Big-Data analytics, Machine/Deep Learning, and Causal Inference. A successful candidate will be a self-starter, comfortable with ambiguity, able to think big and be creative, while still paying careful attention to detail. You should be able to translate how data represents the customer journey, be comfortable dealing with large and complex data sets, and have experience using machine learning and/or econometric modeling to solve business problems. You should have strong analytical and communication skills, be able to work with product managers to define key business questions and work with the engineering team to bring our solutions into production. You will join a highly collaborative and diverse working environment that will empower you to shape the future of Amazon advertising, and also allow you to become part of our large science community. Key job responsibilities • Apply expertise in ML/DL, AI, and causal modeling to develop new models that describe how advertising impacts customers’ actions • Own the end-to-end development of novel scientific models that address the most pressing needs of our business stakeholders and help guide their future actions • Improve upon and simplify our existing solutions and frameworks • Review and audit modeling processes and results for other scientists, both junior and senior • Work with leadership to align our scientific developments with the business strategy • Identify new opportunities that are suggested by the data insights • Bring a department-wide perspective into decision making • Develop and document scientific research to be shared with the greater science community at Amazon About the team AIM is a cross disciplinary team of engineers, product managers, economists, data scientists, and applied scientists with a charter to build scientifically-rigorous causal inference methodologies at scale. Our job is to help customers cut through the noise of the modern advertising landscape and understand what actions, behaviors, and strategies actually have a real, measurable impact on key outcomes. The data we produce becomes the effective ground truth for advertisers and partners making decisions affecting millions in advertising spend.
US, NY, New York
The Ads Measurement Science team in the Measurement, Ad Tech, and Data Science (MADS) team of Amazon Ads serves a centralized role developing solutions for a multitude of performance measurement products. We create solutions which measure the comprehensive impact of advertiser's ad spend, including sales impacts both online and offline and across timescales, and provide actionable insights that enable our advertisers to optimize their media portfolios. We also own the science solutions for AI tools that unlock new insights and automate high-effort customer workflows, such as custom query and report generation based on natural language user requests. We leverage a host of scientific technologies to accomplish this mission, including Generative AI, classical ML, Causal Inference, Natural Language Processing, and Computer Vision. As an Applied Scientist on the team, you will lead measurement solutions end-to-end from inception to production. You will propose, design, analyze, and productionize models to provide novel measurement insights to our customers. Key job responsibilities - Leverage deep expertise in one or more scientific disciplines to invent solutions to ambiguous ads measurement problems - Disambiguate problems to propose clear evaluation frameworks and success criteria - Work autonomously and write high quality technical documents - Implement a significant portion of critical-path code, and partner with engineers to directly carry solutions into production - Partner closely with other scientists to deliver large, multi-faceted technical projects - Share and publish works with the broader scientific community through meetings and conferences - Communicate clearly to both technical and non-technical audiences - Contribute new ideas that shape the direction of the team's work - Mentor more junior scientists and participate in the hiring process About the team We are a team of scientists across Applied, Research, Data Science and Economist disciplines. You will work with colleagues with deep expertise in ML, NLP, CV, Gen AI, and Causal Inference with a diverse range of backgrounds. We partner closely with top-notch engineers, product managers, sales leaders, and other scientists with expertise in the ads industry and on building scalable modeling and software solutions.
US, NY, New York
The Ads Measurement Science team in the Measurement, Ad Tech, and Data Science (MADS) team of Amazon Ads serves a centralized role developing solutions for a multitude of performance measurement products. We create solutions which measure the comprehensive impact of advertiser's ad spend, including sales impacts both online and offline and across timescales, and provide actionable insights that enable our advertisers to optimize their media portfolios. We also own the science solutions for AI tools that unlock new insights and automate high-effort customer workflows, such as custom query and report generation based on natural language user requests. We leverage a host of scientific technologies to accomplish this mission, including Generative AI, classical ML, Causal Inference, Natural Language Processing, and Computer Vision. As an Applied Scientist on the team, you will lead measurement solutions end-to-end from inception to production. You will propose, design, analyze, and productionize models to provide novel measurement insights to our customers. Key job responsibilities - Leverage deep expertise in one or more scientific disciplines to invent solutions to ambiguous ads measurement problems - Disambiguate problems to propose clear evaluation frameworks and success criteria - Work autonomously and write high quality technical documents - Implement a significant portion of critical-path code, and partner with engineers to directly carry solutions into production - Partner closely with other scientists to deliver large, multi-faceted technical projects - Share and publish works with the broader scientific community through meetings and conferences - Communicate clearly to both technical and non-technical audiences - Contribute new ideas that shape the direction of the team's work - Mentor more junior scientists and participate in the hiring process About the team We are a team of scientists across Applied, Research, Data Science and Economist disciplines. You will work with colleagues with deep expertise in ML, NLP, CV, Gen AI, and Causal Inference with a diverse range of backgrounds. We partner closely with top-notch engineers, product managers, sales leaders, and other scientists with expertise in the ads industry and on building scalable modeling and software solutions.
ES, B, Barcelona
Are you interested in changing how Amazon does marketing — moving beyond platform-optimized broad reach to campaigns that find the right customer, at the right moment, using Amazon's unmatched 1P data? We are seeking an Applied Scientist to join PRIMAS (Prime & Marketing Analytics and Science). In this role, you will design and run the experiments that answer the foundational question for EU marketing: does adding 1P audience signal on top of Value-Based Optimization (VBO) improve marketing efficiency — and if so, for which customer cohorts, on which surfaces, and at what scale? Amazon's current marketing model is largely platform-led: we set objectives and let platforms optimize toward conversion. This approach works well for broad acquisition but systematically underserves lifecycle goals — it cannot distinguish between a Bargain Hunter who will never pay full price and a high-potential customer one nudge away from becoming a Prime member. This role sits at the center of changing that. You will build the 1P audiences, design the experiments that test them, and generate the evidence that guides how Amazon allocates hundreds of millions in marketing spend. Year 1 is an experimentation year. You will deploy 1P audiences across multiple surfaces and channels — Meta, Google, Amazon Display Ads — and measure incrementally against VBO baselines. The goal is not to replace platform optimization but to understand when and where the combination of 1P signal + VBO outperforms VBO alone, and to build the experimental infrastructure that makes this learning scalable. Key job responsibilities 1P Audience Development & Experimentation: - Build and validate 1P audience segments from Amazon behavioral, transactional, and lifecycle data - Design experiments that isolate the incremental effect of 1P audience signal over platform VBO baselines - Deploy audiences across activation surfaces and establish measurement standards that make cross-surface comparison valid Causal Measurement & Incrementality: - Apply causal inference methods to measure the true incremental lift of audience-based targeting vs. VBO - Develop power analysis frameworks and guardrails that enable rapid experimentation without underpowered or conflated tests - Deliver optimization recommendations grounded in experimental evidence: which cohorts respond, which surfaces deliver, which creative strategies drive behavior change Scaling the Learning: - Build reusable audience and measurement frameworks that can be deployed across campaigns and channels — year 1 experiments should produce infrastructure, not one-off analyses - Document experimental learnings in a way that informs both the 2026 roadmap and the business case for investing further in 1P audience capabilities in 2027+ - Partner with engineering and PMT to translate validated audience prototypes into production-ready solutions that scale beyond the experimentation phase About the team The PRIMAS team, is part of a larger tech tech team of 100+ people called WIMSI (WW Integrated Marketing Systems and Intelligence). WIMSI core mission is to accelerate marketing technology capabilities that enable de-averaged customer experiences across the marketing funnel: awareness, consideration, and conversion.
US, MA, Boston
We're a new research lab based in San Francisco and Boston focused on developing foundational capabilities for useful AI agents. We're pursuing several key research bets that will enable AI agents to perform real-world actions, learn from human feedback, self-course-correct, and infer human goals. We're particularly excited about combining large language models (LLMs) with reinforcement learning (RL) to solve reasoning and planning, learned world models, and generalizing agents to physical environments. We're a small, talent-dense team with the resources and scale of Amazon. Each team has the autonomy to move fast and the long-term commitment to pursue high-risk, high-payoff research. AI agents are the next frontier—the right research bets can reinvent what's possible. Join us and help build this lab from the ground up. Key job responsibilities * Define the product vision and roadmap for our agentic developer platform, translating research into products developers love * Partner deeply with research and engineering to identify which capabilities are ready for productization and shape how they're exposed to customers * Own the developer experience end-to-end from API design and SDK ergonomics to documentation, sample apps, and onboarding flows * Understand our customers deeply by engaging directly with developers and end-users, synthesizing feedback, and using data to drive prioritization * Shape how the world builds AI agents by defining new primitives, patterns, and best practices for agentic applications About the team Our team brings the AGI Lab's agent capabilities to customers. We build accessible, usable products: interfaces, frameworks, and solutions, that turn our platform and model capabilities into AI agents developers can use. We own the Nova Act agent playground, Nova Act IDE extension, Nova Act SDK, Nova Act AWS Console, reference architectures, sample applications, and more.
US, CA, San Francisco
Amazon is on a mission to redefine the future of automation — and we're looking for exceptional talent to help lead the way. We are building the next generation of advanced robotic systems that seamlessly blend cutting-edge AI, sophisticated control systems, and novel mechanical design to create adaptable, intelligent automation solutions capable of operating safely alongside humans in dynamic, real-world environments. At Amazon, we leverage the power of machine learning, artificial intelligence, and advanced robotics to solve some of the most complex operational challenges at a scale unlike anywhere else in the world. Our fleet of robots spans hundreds of facilities globally, working in sophisticated coordination to deliver on our promise of customer excellence — and we're just getting started. As a Sr. Scientist in Robot Navigation, you will be at the forefront of this transformation — architecting and delivering navigation systems that are intelligent, safe, and scalable. You will bring deep expertise in learning-based planning and control, a strong understanding of foundation models and their application to embodied agents, and as well as have in-depth understanding of control-theoretic approaches such as model predictive control (MPC)-based trajectory planning. You will develop navigation solutions that seamlessly blend data-driven intelligence with principled control-theoretic guarantees. Our vision is bold: to build navigation systems that allow robots to move fluidly and safely through dynamic environments — understanding context, anticipating change, and adapting in real time. You will lead research that bridges the gap between cutting-edge academic advances and production grade deployment, collaborating with world-class teams pushing the boundaries of robotic autonomy, manipulation, and human-robot interaction. Join us in building the next generation of intelligent navigation systems that will define the future of autonomous robotics at scale. Key job responsibilities - Design, develop, and deploy perception algorithms for robotics systems, including object detection, segmentation, tracking, depth estimation, and scene understanding - Lead research initiatives in computer vision, sensor fusion and 3D perception - Collaborate with cross-functional teams including robotics engineers, software engineers, and product managers to define and deliver perception capabilities - Drive end-to-end ownership of ML models — from data collection and labeling strategy to training, evaluation, and deployment - Mentor junior scientists and engineers; contribute to a culture of technical excellence - Define and track key metrics to measure perception system performance in real-world environments - Publish research findings in top-tier venues (CVPR, ICCV, ECCV, ICRA, NeurIPS, etc.) and contribute to patents A day in the life - Train ML models for deployment in simulation and real-world robots, identify and document their limitations post-deployment - Drive technical discussions within your team and with key stakeholders to develop innovative solutions to address identified limitations - Actively contribute to brainstorming sessions on adjacent topics, bringing fresh perspectives that help peers grow and succeed — and in doing so, build lasting trust across the team - Mentor team members while maintaining significant hands-on contribution to technical solutions About the team Our team is a group is a diverse group of scientists and engineers passionate about building intelligent machines. We value curiosity, rigor, and a bias for action. We believe in learning from failure and iterating quickly toward solutions that matter.
US, NY, New York
The Ads Measurement Science team in the Measurement, Ad Tech, and Data Science (MADS) team of Amazon Ads serves a centralized role developing solutions for a multitude of performance measurement products. We create solutions which measure the comprehensive impact of advertiser's ad spend, including sales impacts both online and offline and across timescales, and provide actionable insights that enable our advertisers to optimize their media portfolios. We also own the science solutions for AI tools that unlock new insights and automate high-effort customer workflows, such as custom query and report generation based on natural language user requests. We leverage a host of scientific technologies to accomplish this mission, including Generative AI, classical ML, Causal Inference, Natural Language Processing, and Computer Vision. As a Senior Applied Scientist on the team, you will be at the forefront of innovation, developing measurement solutions end-to-end from inception to production. You will set the technical vision and innovate on behalf of our customers. You will propose, design, analyze, and productionize models to provide novel measurement insights to our customers. You will partner with engineering to deploy these solutions into production. You will work with key stakeholders from various business teams to enable advertisers to act upon those metrics. Key job responsibilities * Lead the development of ad measurement models and solutions that address the full spectrum of an advertiser's investment, focusing on scalable and efficient methodologies. * Collaborate closely with cross-functional teams including engineering, product management, and business teams to define and implement measurement solutions. * Use state-of-the-art scientific technologies including Generative AI, Classical Machine Learning, Causal Inference, Natural Language Processing, and Computer Vision to develop state of the art models that measure the impact of ad spend across multiple platforms and timescales. * Drive experimentation and the continuous improvement of ML models through iterative development, testing, and optimization. * Translate complex scientific challenges into clear and impactful solutions for business stakeholders. * Mentor and guide junior scientists, fostering a collaborative and high-performing team culture. * Foster collaborations between scientists to move faster, with broader impact. * Regularly engage with the broader scientific community with presentations, publications, and patents. A day in the life You will solve real-world problems by getting and analyzing large amounts of data, generate business insights and opportunities, design simulations and experiments, and develop statistical and ML models. The team is driven by business needs, which requires collaboration with other Scientists, Engineers, and Product Managers across the advertising organization. You will prepare written and verbal presentations to share insights to audiences of varying levels of technical sophistication. Team video https://advertising.amazon.com/help/G4LNN5YWHP6SM9TJ About the team We are a team of scientists across Applied, Research, Data Science and Economist disciplines. You will work with colleagues with deep expertise in ML, NLP, CV, Gen AI, and Causal Inference with a diverse range of backgrounds. We partner closely with top-notch engineers, product managers, sales leaders, and other scientists with expertise in the ads industry and on building scalable modeling and software solutions.
US, WA, Seattle
At Amazon Selection and Catalog Systems (ASCS), our mission is to power the online buying experience for customers worldwide so they can find, discover, and buy any product they want. We innovate on behalf of our customers to ensure uniqueness and consistency of product identity and to infer relationships between products in Amazon Catalog to drive the selection gateway for the search and browse experiences on the website. We're solving a fundamental AI challenge: establishing product identity and relationships at unprecedented scale. Using Generative AI, Visual Language Models (VLMs), and multimodal reasoning, we determine what makes each product unique and how products relate to one another across Amazon's catalog. The scale is staggering: billions of products, petabytes of multimodal data, millions of sellers, dozens of languages, and infinite product diversity—from electronics to groceries to digital content. The research challenges are immense. GenAI and VLMs hold transformative promise for catalog understanding, but we operate where traditional methods fail: ambiguous problem spaces, incomplete and noisy data, inherent uncertainty, reasoning across both images and textual data, and explaining decisions at scale. Establishing product identities and groupings requires sophisticated models that reason across text, images, and structured data—while maintaining accuracy and trust for high-stakes business decisions affecting millions of customers daily. Amazon's Item and Relationship Platform group is looking for an innovative and customer-focused applied scientist to help us make the world's best product catalog even better. In this role, you will partner with technology and business leaders to build new state-of-the-art algorithms, models, and services to infer product-to-product relationships that matter to our customers. You will pioneer advanced GenAI solutions that power next-generation agentic shopping experiences, working in a collaborative environment where you can experiment with massive data from the world's largest product catalog, tackle problems at the frontier of AI research, rapidly implement and deploy your algorithmic ideas at scale, across millions of customers. Key job responsibilities Key job responsibilities include: * Formulate novel research problems at the intersection of GenAI, multimodal learning, and large-scale information retrieval—translating ambiguous business challenges into tractable scientific frameworks * Design and implement leading models leveraging VLMs, foundation models, and agentic architectures to solve product identity, relationship inference, and catalog understanding at billion-product scale * Pioneer explainable AI methodologies that balance model performance with scalability requirements for production systems impacting millions of daily customer decisions * Own end-to-end ML pipelines from research ideation to production deployment—processing petabytes of multimodal data with rigorous evaluation frameworks * Define research roadmaps aligned with business priorities, balancing foundational research with incremental product improvements * Mentor peer scientists and engineers on advanced ML techniques, experimental design, and scientific rigor—building organizational capability in GenAI and multimodal AI * Represent the team in the broader science community—publishing findings, delivering tech talks, and staying at the forefront of GenAI, VLM, and agentic system research
US, CA, San Francisco
In this role, you will act as the primary specialist for physics engine internals and dynamics, developing high-fidelity, vectorized simulation environments for robotics locomotion, navigation, and interaction/manipulation. You will collaborate with hardware engineers to validate robot models and partner with research scientists to ensure numerical stability and physical accuracy for Sim2Real transfer. Your work focuses on tuning solvers, optimizing collision dynamics, and performing system identification to enable the training of robust robot control policies for complex, physical interactions. Key job responsibilities * Develop and maintain the shared simulation software framework, specifically owning the physics integration, robot state management, and control layers * Develop and optimize parallelized (vectorized) physics environments for high-throughput reinforcement learning (e.g., Isaac Lab, MuJoCo) * Tune physics engine parameters (solvers, friction, restitution) to support complex contact-rich scenarios required for dexterous manipulation and agile locomotion. * Implement and validate complex robot models (URDF/MJCF) involving precise actuator and sensor modeling * Collaborate with robot engineers and scientists to perform System Identification (SysID) to minimize the Sim2Real gap About the team At Frontier AI & Robotics (FAR), we're not just advancing robotics – we're reimagining it from the ground up. Our team is building the future of intelligent robotics through frontier foundation models and end-to-end learned systems. We tackle some of the most challenging problems in AI and robotics, from developing sophisticated perception systems to creating adaptive manipulation strategies that work in complex, real-world scenarios. What sets us apart is our unique combination of ambitious research vision and practical impact. We leverage Amazon's massive computational infrastructure and rich real-world datasets to train and deploy state-of-the-art foundation models. Our work spans the full spectrum of robotics intelligence – from multimodal perception using images, videos, and sensor data, to sophisticated manipulation strategies that can handle diverse real-world scenarios. We're building systems that don't just work in the lab, but scale to meet the demands of Amazon's global operations. Join us if you're excited about pushing the boundaries of what's possible in robotics, working with world-class researchers, and seeing your innovations deployed at unprecedented scale.