SecureLion: Building a trustworthy AI assistant with security reasoning in a realistic adversarial competition
2025
We present SecureLion, a trustworthy AI assistant designed to securely han-dle cybersecurity queries and generate vulnerability-free code. Compared to the base model, our system achieves a 76.6% relative reduction on insecure messages in an adversarial competition setting at the cost of negligible utility loss. The success of SecureLion stems primarily from: (1) pervasive security-focused reasoning integrated throughout the pipeline, (2) high-quality synthetic datasets curated through agentic and collaborative workflows, (3) balanced data mixes ensuring both security alignment and utility retention, (4) seamless integration of specialized model variants—including a query intent analyzer, a safe response generator, a robust output guard, and a code vulnerability fixer—to maximize defense effectiveness within stringent latency constraints, and (5) a stable, efficient in-house evaluation framework guiding iterative model development cycles. Rather than depending solely on isolated training optimizations, we emphasize systematic integration and effective collaboration among these key components. We release our datasets, training recipes, experimental frameworks, and comprehensive evaluation results highlighting the performance differences across various data mixes, inference pipelines, and training strategies. The effectiveness and robustness of SecureLion are validated through participation in the Amazon Nova AI Challenge 2025, a real-world adversarial competition with uncontrolled red-team attacks, underscoring the authenticity and practical applicability of our approach. Our contributions provide valuable insights and reproducible resources for researchers and practitioners committed to advancing secure and responsible AI development.