While large language models (LLMs) achieve strong performance in code generation, persistent security vulnerabilities hinder their safe deployment. Starting from a pretrained CodeGen model without inherent safety mechanisms, we develop a systematic synthetic instruction-tuning workflow to progressively enhance model security. Our pipeline begins with taxonomy-guided synthetic data, capturing diverse attack vectors across syntactic, semantic, and embedding dimensions. We iteratively refine this dataset through targeted expansions, adaptive patching of identified failure modes, sophisticated dynamic attacks—including vulnerability-guided scenarios, skill-based exploits, and agent-driven interactions—and external red-teaming practices. Leveraging this enriched dataset, we fine-tune multiple models us-ing supervised fine-tuning (SFT) and direct preference optimization (DPO), ultimately fusing these into a secure, robust code-generation architecture.
Empirical evaluations across standardized benchmarks and adaptive LLM-based judges show that our iterative development process significantly reduces vulnerabilities, consistently outperforming leading baseline models (Claude, Gemini-Pro, CodeLlama). Our approach ranked second in Tournament 1 and first in Tournament 2, demonstrating the efficacy and practical utility of our workflow. Unlike conventional AI safety approaches focusing predominantly on refusal behaviors, secure code-generation requires precise balancing between generating code with minimal vulnerabilities and reliably refusing malicious requests. To this end, we contribute a detailed, systematic, and reproducible end-to-end pipeline, exemplifying best practices for steering code-generating models toward enhanced safety and security.
Data is all you need (almost): Iterative synthetic instruction tuning for secure code generation
2025