Build on Trainium: Accelerating Post-Training call for proposals — Spring 2026

Building the future of AI with AWS Trainium

By Amazon Research Awards team

10 min read

What is Build on Trainium?

Build on Trainium is a $110MM credit program focused on AI research and university education to support the next generation of innovation and development on AWS Trainium. AWS Trainium chips are purpose-built for high-performance deep learning (DL) training of generative AI models, including large language models (LLMs) and latent diffusion models. Build on Trainium provides compute credits to novel AI research on Trainium, investing in leading academic teams to build innovations in critical areas including new model architectures, ML libraries, optimizations, large-scale distributed systems, and more. This multi-year initiative lays the foundation for the future of AI by inspiring the academic community to utilize, invest in, and contribute to the open-source community around Trainium. Combining these benefits with Neuron software development kit (SDK) and recent launch of the Neuron Kernel Interface (NKI), AI researchers can innovate at scale in the cloud.

What are AWS Trainium and Neuron?

AWS Trainium is an AI chip developed by AWS for accelerating building and deploying machine learning models. Built on a specialized architecture designed for deep learning, Trainium accelerates the training and inference of complex models with high output and scalability, making it ideal for academic researchers looking to optimize performance and costs. This architecture also emphasizes sustainability through energy-efficient design, reducing environmental impact. Amazon has established a dedicated Trainium research cluster featuring up to 40,000 Trainium chips, accessible via Amazon EC2 Trn1 instances. These instances are connected through a non-blocking, petabit-scale network using Amazon EC2 UltraClusters, enabling seamless high-performance ML training. The Trn1 instance family is optimized to deliver substantial compute power for cutting-edge AI research and development. This unique offering not only enhances the efficiency and affordability of model training but also presents academic researchers with opportunities to publish new papers on underrepresented compute architectures, thus advancing the field.

Focus on Post-Training

Post-training transforms base language models into aligned, useful AI systems. This domain encompasses the techniques applied after pre-training — including supervised fine-tuning, preference optimization, reinforcement learning from human feedback, and model compression — that determine how models behave in deployment. As models scale and alignment requirements grow more sophisticated, post-training methods face fundamental challenges in sample efficiency, scalability, and evaluation.
We seek proposals that advance post-training research on Trainium, addressing open problems across the following key areas:

1. Online Reinforcement Learning and Reward Innovation on Trainium

Online RL for alignment faces fundamental challenges in sample efficiency, training stability, and the complex interplay between policy updates and rollout generation across distributed accelerator topologies. Trainium's architecture, with its high-band width Neuron Link interconnect, native collective communication primitives, and colocated training/inference capability, creates unique opportunities for RL algorithm design that exploits hardware-aware parallelism. We seek proposals that advance online RL research specifically on Trainium, including:

Algorithmic Innovation on Trainium: Novel RL algorithms for alignment that leverage Trainium's architecture, including hybrid online-offline methods that exploit colocated training and inference on the same chip, multi-agent RL approaches that map naturally to Trainium's Neuron Coretopology, and alternatives to the standard actor-critic framework that reduce the weight synchronization overhead inherent in disaggregated accelerator deployments.
Reward Model Architectures for Accelerator-Efficient Alignment: Novel reward model designs, including multi-objective rewards, process reward models for step-level feedback during reasoning, and ensemble approaches, optimized for Trainium's compute and memory hierarchy, with emphasis on architectures that enable efficient reward inference alongside policy training without requiring separate GPU-based reward serving.

2. Efficient Post-Training Methods

Post-training large models requires substantial compute, limiting iteration speed and accessibility. Trainium's memory hierarchy (28-32 MiB SBUF per Neuron Core, 96-144 GiB HBM per device) and native support for mixed-precision formats (BF16, FP8, MXFP8) create distinct optimization opportunities compared to GPU architectures. We seek proposals that advance efficient post-training on Trainium, including:

Parameter-Efficient Fine-Tuning: Novel methods beyond LoRA for efficient adaptation on Trainium, including adaptive ranks election that accounts for Neuron Core tensor engine constraints, structured adapters optimized for Trainium's systolic array geometry, and hybrid approaches that exploit the large on-chip SBUF for adapter weight caching.
Memory-Efficient Training: Techniques for reducing memory footprint during post-training that leverage Trainium's DMA engine architecture and HBM bandwidth characteristics, including activation checkpointing strategies tuned to Neuron Core memory tiers, optimizer state compression compatible with Trainium's native data formats, and host-device offloading via EFA.
Compute-Optimal Post-Training: Understanding the scaling laws for post-training compute on non-GPU accelerators, including optimal allocation between SFT, preference optimization, and online RL given Trainium's price-performance characteristics relative to GPU alternatives.
Quantization-Aware Post-Training: Methods for post-training that account for Trainium's native MXFP8 quantization format on Trn3, including QAT for alignment that targets OCP-compliant micro scaling, and quantization-robust fine-tuning that bridges the BF16 training to MXFP8 inference gap.

3. Scalable Distributed Post-Training Systems

Production post-training requires coordinating training workers, inference workers for rollout generation, reward model inference, and weight synchronization across potentially hundreds of nodes. Trainium's Neuron Link interconnect topology, out-of-NEFF collective communication, and EFA networking present a different distributed systems design space than NVLink/NVSwitch. We seek proposals that advance distributed post-training systems research on Trainium, including:

Asynchronous Training: Methods for online RL with asynchronous policy updates on Trainium, including staleness management across Neuron Core groups, importance weighting strategies that account for Trainium's collective communication latency profile, and convergence guarantees for non-blocking weight updates via host CC.
Efficient Weight Synchronization: Techniques for fast weight transfer between training and inference on Trainium, including delta compression over Neuron Link, partial weight updates that exploit Trainium's native sharding primitives (FSDP, tensor parallelism via Device Mesh), and pipelined synchronization that overlaps compute with communication on separate hardware queues.
Disaggregated Architectures: System designs that separate training and inference compute across Trainiumin stances for independent scaling, including communication protocols optimized for EFA fabric, scheduling strategies for heterogeneous Neuron Core allocation, and colocated vs. disaggregated tradeoff analysis specific to Trainium's memory and interconnect constraints.
Fault Tolerance: Methods for resilient post-training at scale on Trainium clusters, including distributed checkpointing strategies that leverage Trainium's checkpoint APIs, recovery mechanisms for Neuron Core failures during long-running RL loops, and graceful degradation under node failures in multi-node training configurations.

4. Agentic Alignment

Agentic systems require alignment not just of outputs, but of decision-making processes, action sequences, and goal-directed behavior. Trainium's ability to colocate model inference with training on the same chip, combined with its native support for dynamic control flow and low-latency collective operations, makes it a natural platform for agentic RL workloads that require tight coupling between generation and learning. We seek proposals that advance agentic alignment on Trainium, including:

Tool Use and Planning Alignment: Methods for aligning models that interact with external tools and APIs while performing multi-step reasoning on Trainium, including safe tool selection, parameter validation, goal decomposition, intermediate step validation, plan safety verification, and alignment of chain-of-thought reasoning, with emphasis on leveraging Trainium's colocated inference for low-latency tool call evaluation.
Action Space Safety: Methods for constraining and aligning agent behavior in complex action spaces on Trainium, including safe exploration strategies, action masking, constraint satisfaction, and preventing harmful action sequences.
Multi-Turn Agent Interactions: Alignment techniques for agents engaged in extended interactions on Trainium, including maintaining alignment across conversation turns, credit assignment for delayed outcomes, and coherent goal-tracking over time.
Multi-Modal Agent Perception: Aligning agents that perceive and act on multi-modal inputs (vision, language, structured data) on Trainium, including cross-modal consistency, visual grounding for actions, and multi-modal safety assessment.
Evaluation for Agentic Systems: Benchmarks and metrics specifically designed for agentic alignment on Trainium, including task-completion safety metrics, action-level evaluation, multi-turn coherence assessment, and environment-based safety testing that can be reproduced on Trainium infrastructure.

Timeline

Submission period: March 25 — May 13, 2026 (11:59 PM Pacific Time).
Decision letters will be sent out in August 2026.

Award details

Selected Principal Investigators (PIs) may receive the following:

Applicants are encouraged to request AWS Promotional Credits in one of two ranges:
1. AWS Promotional Credits, up to $50,000
2. AWS Promotional Credits, up to $250,000 and beyond
AWS Trainium training resources, including AWS tutorials and hands-on sessions with Amazon scientists and engineers

Awards are structured as one-time unrestricted gifts. The budget should include a list of expected costs specified in USD, and should not include administrative overhead costs. The final award amount will be determined by the awards panel.

Your receipt and use of AWS Promotional Credits is governed by the AWS Promotional Credit Terms and Conditions, which may be updated by AWS from time to time.

Eligibility requirements

Please refer to the ARA Program rules on the Rules and Eligibility page.

Proposal requirements

PIs are encouraged to exemplify how their proposed techniques or research studies advance kernel optimization, LLM innovation, distributed systems, or developer efficiency. PIs should either include plans for open source contributions or state that they do not plan to make any open source contributions (data or code) under the proposed effort. Proposals for this CFP should be prepared according to the proposal template and are encouraged to be a maximum of 3 pages, not including Appendices.

Selection criteria

Proposals will be evaluated on the following:

Creativity and quality of the scientific content
Potential impact to the research community and society at large
Interest expressed in open-sourcing model artifacts, datasets and development frameworks
Intention to use and explore novel hardware for AI/ML, primarily AWS Trainium and Inferentia

Expectations from recipients

To the extent deemed reasonable, Award recipients should acknowledge the support from ARA. Award recipients will inform ARA of publications, presentations, code and data releases, blogs/social media posts, and other speaking engagements referencing the results of the supported research or the Award. Award recipients are expected to provide updates and feedback to ARA via surveys or reports on the status of their research. Award recipients will have an opportunity to work with ARA on an informational statement about the awarded project that may be used to generate visibility for their institutions and ARA.

About the Author

Amazon Research Awards team