From frontier to frugal: Evaluating self-evolution frameworks with small language models

Shayan Ali Akbar; Jiaming Qu; Chen Ling; Madhu Gopinathan; Erwin Cornejo

Publication

From frontier to frugal: Evaluating self-evolution frameworks with small language models

By Shayan Ali Akbar, Jiaming Qu, Chen Ling, Madhu Gopinathan, Erwin Cornejo

2026

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Optimizing Large Language Models (LLMs) for production AI agent deployment demands substantial computational resources and specialized human expertise (e.g., prompt engineering). Self-evolution offers a promising solution by enabling agents to autonomously enhance capabilities through structured feedback, improving performance without expensive manual optimization. However, most existing self-evolving agents rely on costly frontier LLMs, limiting scalability. Can small language models (SLMs) achieve effective self-evolution while remaining cost-efficient? We investigate this by applying two SLMs (Qwen and Claude Haiku) to self-evolution frameworks (Gödel Agent and GEPA) on MGSM mathematical reasoning, MMLU knowledge reasoning, and ARC science questions benchmarks. Our experiments demonstrate that SLMs equipped with feedback loops successfully evolve by rewriting code, optimizing parameters, and refining prompt strategies. Remarkably, SLMs achieve accuracy gains up to 15.5% with evolution costs as low as $0.04 per optimization cycle, enabling scalable production deployments. While gains are more modest than frontier LLMs (specifically on certain benchmarks like MGSM), our results demonstrate that self-improving SLMs offer a viable, cost-effective path toward scalable autonomous AI systems.

From frontier to frugal: Evaluating self-evolution frameworks with small language models

Latest news

Work with us