Build on Trainium: Kernels for ML Acceleration call for proposals — Spring 2026

Building the future of AI with AWS Trainium

By Amazon Research Awards team

9 min read

What is Build on Trainium?

Build on Trainium is a $110MM credit program focused on AI research and university education to support the next generation of innovation and development on AWS Trainium. AWS Trainium chips are purpose-built for high-performance deep learning (DL) training of generative AI models, including large language models (LLMs) and latent diffusion models. Build on Trainium provides compute credits to novel AI research on Trainium, investing in leading academic teams to build innovations in critical areas including new model architectures, ML libraries, optimizations, large-scale distributed systems, and more. This multi-year initiative lays the foundation for the future of AI by inspiring the academic community to utilize, invest in, and contribute to the open-source community around Trainium. Combining these benefits with Neuron software development kit (SDK) and recent launch of the Neuron Kernel Interface (NKI), AI researchers can innovate at scale in the cloud.

What are AWS Trainium and Neuron?

AWS Trainium is an AI chip developed by AWS for accelerating building and deploying machine learning models. Built on a specialized architecture designed for deep learning, Trainium accelerates the training and inference of complex models with high output and scalability, making it ideal for academic researchers looking to optimize performance and costs. This architecture also emphasizes sustainability through energy-efficient design, reducing environmental impact. Amazon has established a dedicated Trainium research cluster featuring up to 40,000 Trainium chips, accessible via Amazon EC2 Trn1 instances. These instances are connected through a non-blocking, petabit-scale network using Amazon EC2 UltraClusters, enabling seamless high-performance ML training. The Trn1 instance family is optimized to deliver substantial compute power for cutting-edge AI research and development. This unique offering not only enhances the efficiency and affordability of model training but also presents academic researchers with opportunities to publish new papers on underrepresented compute architectures, thus advancing the field.

Focus on Kernels for ML Acceleration

GenAI for Kernel Development: As kernel development becomes increasingly complex and specialized for accelerators like Trainium, generative AI offers opportunities to automate and optimize the kernel development lifecycle. We seek proposals that leverage GenAI to accelerate kernel creation, optimization, and maintenance on Trainium, including:
1. Automated Kernel Generation: Methods for using GenAI to generate high-performance NKI kernels from high-level specifications, including natural language descriptions, mathematical formulations, or reference implementations in other frameworks. This includes agentic workflows leveraging reinforcement learning, inference-time compute scaling, and multi-agent systems with iterative refinement where agents execute kernels, observe performance metrics, and progressively improve implementations through feedback loops.
2. Kernel Optimization and Tuning: Techniques for using GenAI to automatically optimize existing kernels, including instruction scheduling, memory access patterns, tile size selection, and register allocation strategies. This encompasses knowledge distillation and memory systems that capture, distill, and organize optimization insights into structured knowledge bases of architecture-specific heuristics, enabling continuous learning from both successful and failed optimization attempts.
3. Performance Debugging and Analysis: AI-assisted tools for identifying performance bottlenecks, suggesting optimizations, and explaining performance characteristics of NKI kernels. This includes methods for correctness verification and robustness testing that rigorously verify functional correctness beyond standard test cases, detecting subtle bugs and reward hacking behaviors where kernels achieve favorable metrics while producing incorrect outputs.
4. Code Completion and Synthesis: Methods for intelligent code completion, pattern recognition, and synthesis of common kernel idioms specific to NKI and Trainium architecture. This includes transfer learning and domain adaptation techniques for adapting kernel generation across different hardware generations or compiler versions with minimal training data, as well as explain ability methods that make AI-generated kernels interpretable and maintainable through documentation generation and collaborative human-AI development workflows.
5. Benchmark Construction and Evaluation: Development of comprehensive, representative benchmark suites for evaluating kernel generation and optimization techniques on Trainium, including systematic methodologies for creating diverse kernel collections spanning operator types, tensor shapes, data layouts, and compute/memory-bound characteristics representative of real-world model workloads.
Developer Tools and Profiling: Effective kernel development requires sophisticated tooling for understanding performance, debugging behavior, and iterating designs. We seek proposals that advance the NKI developer experience on Trainium, including:
1. Novel Profiling Visualizations and Human-Computer Interaction: Innovative visualization techniques that blend performance analysis with HCI research to make complex kernel behavior intuitive and actionable, including interactive 3D performance landscapes, temporal execution flow visualizations, comparative visual analytics across kernel variants, attention-driven bottleneck highlighting, and multi-dimensional performance space exploration tools that enable developers to quickly identify optimization opportunities through visual pattern recognition.
2. Performance Modeling and Estimation: Advanced methods for predicting kernel performance before execution, including analytical roofline models extended for Trainium architecture, learned performance predictors using neural networks trained on kernel characteristics, hybrid symbolic-numeric performance models, static analysis techniques for estimating memory bandwidth and compute utilization, and probabilistic performance bounds that account for hardware variability and dynamic effects.
3. Debugging and Verification Tools: Methods for validating kernel correctness, detecting numerical issues, and debugging complex kernel behaviors, including symbolic execution and formal verification approaches.
4. Interactive Development Environments: Enhanced IDE support for NKI development, including syntax highlighting, type checking, inline performance hints derived from real-time estimation models, and integration with existing development workflows.
Kernel Porting and Cross-Framework Translation: The ecosystem of kernel languages continues to fragment, creating barriers to adoption and limiting code reuse. We seek proposals that enable seamless translation between kernel frameworks while preserving high performance on Trainium, including:
1. Automated Kernel Translation: Methods for automatically porting kernels from other frameworks (Triton, CUDA,CuTe, Pallas) to NKI specifically while maintaining or improving performance, including semantic-preserving transformations and architecture-specific optimizations
2. Cross-Framework Optimization: Methods for leveraging optimization techniques across different kernel languages, including pattern matching, optimization transfer learning, and unified intermediate representations.
3. Performance Portability: Approaches for ensuring translated kernels achieve competitive performance with hand-written implementations, including auto-tuning, architecture-aware code generation, and performance validation frameworks.
Kernel Language Design and Abstractions: The design of kernel languages fundamentally shapes developer productivity and achievable performance. We seek proposals that explore novel language representations, APIs, and abstractions for NKI on Trainium, including:
1. Alternative Language Representations: Novel representations for expressing kernel computations, including tensor comprehensions, polyhedral models, and domain-specific languages that improve expressiveness or enable better optimization.
2. API Design and Primitives: Improved APIs and primitive operations for kernels including higher-level abstractions that maintain performance while improving usability, composability, and maintainability.
3. Abstraction Layers: Methods for building layered abstractions that allow developers to work at different levels of detail, from high-level operations to low-level hardware control, with smooth transitions between levels.

Timeline

Submission period: March 25 — May 13, 2026 (11:59 PM Pacific Time)
Decision letters will be sent out in August 2026.

Award details

Selected Principal Investigators (PIs) may receive the following:

Applicants are encouraged to request AWS Promotional Credits in one of two ranges:
1. AWS Promotional Credits, up to $50,000
2. AWS Promotional Credits, up to $250,000 and beyond
AWS Trainium training resources, including AWS tutorials and hands-on sessions with Amazon scientists and engineers

Awards are structured as one-time unrestricted gifts. The budget should include a list of expected costs specified in USD, and should not include administrative overhead costs. The final award amount will be determined by the awards panel.

Your receipt and use of AWS Promotional Credits is governed by the AWS Promotional Credit Terms and Conditions, which may be updated by AWS from time to time.

Eligibility requirements

Please refer to the ARA Program rules on the Rules and Eligibility page.

Proposal requirements

PIs are encouraged to exemplify how their proposed techniques or research studies advance kernel optimization, LLM innovation, distributed systems, or developer efficiency. PIs should either include plans for open source contributions or state that they do not plan to make any open source contributions (data or code) under the proposed effort. Proposals for this CFP should be prepared according to the proposal template and are encouraged to be a maximum of 3 pages, not including Appendices.

Selection criteria

Proposals will be evaluated on the following:

Creativity and quality of the scientific content
Potential impact to the research community and society at large
Interest expressed in open-sourcing model artifacts, datasets and development frameworks
Intention to use and explore novel hardware for AI/ML, primarily AWS Trainium and Inferentia

Expectations from recipients

To the extent deemed reasonable, Award recipients should acknowledge the support from ARA. Award recipients will inform ARA of publications, presentations, code and data releases, blogs/social media posts, and other speaking engagements referencing the results of the supported research or the Award. Award recipients are expected to provide updates and feedback to ARA via surveys or reports on the status of their research. Award recipients will have an opportunity to work with ARA on an informational statement about the awarded project that may be used to generate visibility for their institutions and ARA.

How to apply

When you're ready to submit your proposal, use the button below, where you'll be prompted to sign up or log in, and follow the instructions on the site.

Submit your proposal

About the Author

Amazon Research Awards team