We introduce PurpCode, a novel post-training method that aligns coding assistants to perform safety reasoning to defend against malicious cyber activities and provide secure and functional code. Our approach trains a reasoning model in two stages:(i) Rule learning, which explicitly teaches the model to reference cyber safety rules to avoid facilitating malicious cyber events and to generate vulnerability-free code; and (ii) Reinforcement learning, which utilizes diverse rewards to jointly optimize for model utility and safety. To empower the training pipelines with safety data, we perform internal red-teaming to synthesize comprehensive and high-coverage prompts for inducing unsafe cyber activities in the model.
We developed PurpCode as a model developer team participating in the Amazon Nova AI Challenge (2024–2025). In this challenge, five university red teams were formed to probe and attack the safety-aligned code models from five model developer teams, across four Tournaments. We applied PurpCode to align the Prize LLM 8B model provided by the challenge. This is a research-only model created solely for the competition (not used in or representative of production). Our evaluation shows that rule learning can improve the cyber safety of the prize model by 2× in the red-teaming evaluation from Tournament 3 conducted by the university red teams participating in the challenge. Furthermore, reinforcement learning further boosts the rule-learning model’s cyber safety by 1.2× in Tournament 3, with minimal degradation of model utility regarding code generation and basic security knowledge. We perform extensive and in-depth evaluation by applying PurpCode to 14B to 32B open-source models, showing that PurpCode provides state-of-the-art model cybersafety without overrefusal and utility degradation compared to prior cybersafety alignment approaches. These results demonstrate that our alignment technique is effective and can generalize to a large set of cyber-safety scenarios.