From frontier to frugal: Evaluating self-evolution frameworks with small language models
2026
Optimizing Large Language Models (LLMs) for production AI agent deployment demands substantial computational resources and specialized human expertise (e.g., prompt engineering). Self-evolution offers a promising solution by enabling agents to autonomously enhance capabilities through structured feedback, improving performance without expensive manual optimization. However, most existing self-evolving agents rely on costly frontier LLMs, limiting scalability. Can small language models (SLMs) achieve effective self-evolution while remaining cost-efficient? We investigate this by applying two SLMs (Qwen and Claude Haiku) to self-evolution frameworks (Gödel Agent and GEPA) on MGSM mathematical reasoning, MMLU knowledge reasoning, and ARC science questions benchmarks. Our experiments demonstrate that SLMs equipped with feedback loops successfully evolve by rewriting code, optimizing parameters, and refining prompt strategies. Remarkably, SLMs achieve accuracy gains up to 15.5% with evolution costs as low as $0.04 per optimization cycle, enabling scalable production deployments. While gains are more modest than frontier LLMs (specifically on certain benchmarks like MGSM), our results demonstrate that self-improving SLMs offer a viable, cost-effective path toward scalable autonomous AI systems.
Research areas