AlquistCoder: A constitution-guided approach to safe, trustworthy code generation
2025
We introduce AlquistCoder, a code-generating system that effectively minimizes the risk of producing malicious content or vulnerable code while maintaining excellent Python coding and question answering standards across a wide range of tasks. The architecture of AlquistCoder employs a sophisticated input guardrail classifier that analyzes whether the user’s intention is benign, potentially harmful, or falls into a security-sensitive domain requiring special handling. Based on this classification, the system’s coding LLM receives an appropriately tailored system prompt and produces a contextually relevant response. This response is then evaluated by an output guardrail classifier to detect any security vulnerabilities that might have been introduced inadvertently. If problems are identified during this evaluation phase, the system automatically regenerates the answer until it meets our safety standards. Although several public datasets were used for training, we primarily utilized synthetically generated data. Our training methodology followed a multi-stage approach: we first aligned the model through supervised fine-tuning on high-quality examples and then further refined its capabilities using Direct Preference Optimization to enhance both code quality and safety aspects. Beyond architectural innovations, we introduce a novel data generation pipeline inspired by Constitutional AI and Constitutional Classifiers principles, resulting in a constitution-focused approach designed specifically for each stage of the training process.