Small language models for efficient agentic tool calling: Outperforming large models with targeted fine-tuning
2026
As organizations scale adoption of generative AI, model cost optimization and operational efficiency have emerged as critical factors determining sustainability and accessibility. While Large Language Models (LLMs) demonstrate impressive capabilities across diverse tasks, their extensive computational requirements make them cost-prohibitive for routine enterprise use. This limitation motivates the exploration of Small Language Models (SLMs), which can deliver comparable performance in targeted applications while drastically reducing infrastructure overhead (Irugalbandara et al., 2023). In this work, we investigate the feasibility of replacing LLM-driven workflows with optimized SLMs. We trained a domain-adapted SLM to execute representative tasks tra-ditionally handled by LLMs, such as document summarization, query answering, and structured data interpretation. As part of the experiment, we investigated the fine-tuning of facebook/opt-350m model (single epoch only) using the Hugging Face TRL (Transformer Reinforcement Learning), specifically the Supervised Fine-Tuning (SFT) trainer. The OPT-350M model was released by Meta AI in 2022 as part of the OPT (Open Pretrained Transformer) family of models. Similar studies demonstrate that even models at the 350M parameter scale can meaningfully contribute to instruction-tuning pipelines (Mekala et al., 2024). Experimental results demonstrated that our fine-tuned SLM achieves exceptional performance with a 77.55% pass rate on ToolBench evalua-tion, significantly outperforming all baseline models includ-ing ChatGPT-CoT (26.00%), ToolLLaMA-DFS (30.18%), and ToolLLaMA-CoT (16.27%). These benchmarks, first introduced in ToolLLM (Qin et al., 2023) and later stabilized by follow-up efforts (Zhang et al., 2024), have become the standard for evaluating tool-augmented reasoning. Recent work has also extended ToolBench traces to preference-based op-timization (Zeng et al., 2024) and designed alternative multi-API corpora for tool-use robustness (Liu et al., 2024). These findings emphasize that thoughtful design and targeted training of SLMs can significantly lower barriers to adoption, enabling cost-effective, large-scale integration of generative AI into production systems.
Research areas