IclForge: Enhancing in-context learning with evolutionary algorithms under budgeted annotation
2025
In-context learning (ICL) has emerged as a powerful paradigm for adapting Large Language Models (LLMs) to specific tasks without parameter updates. While various strategies exist for selecting relevant ICL exemplars from a labeled pool, the fundamental challenge of constructing this high-quality pool remains largely unexplored, especially for new tasks or domains with limited labeled data. We present IclForge, a novel active learning framework that efficiently selects informative examples from unlabeled datasets to be annotated and included in the ICL pool. Unlike traditional active learning methods that optimize for individual example informativeness, IclForge explicitly considers the interdependence of examples within the ICL context. Through extensive experiments across diverse datasets and LLM architectures, we show that IclForge outperforms standard active learning baselines by +180–450 basis points while requiring 50% fewer annotations. Our framework is complementary to existing ICL selection strategies and extends naturally to generative applications, which we demonstrate through experiments on Math Word Problem (MWP) tasks. These results highlight IclForge's effectiveness in constructing high-quality ICL exemplar pools in resource-constrained scenarios.