LLMEvalRec: An agentic framework for simulating users to evaluate news recommendation systems

Yao Ma; Samuel Louvan; Abhishek Tripathi; Wei Liu; Murat Sensoy

Publication

LLMEvalRec: An agentic framework for simulating users to evaluate news recommendation systems

By Yao Ma, Samuel Louvan, Abhishek Tripathi, Wei Liu, Murat Sensoy

2026

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Evaluating news recommendation systems (NRS) presents unique challenges due to their dynamic and interactive nature coupled with evolving user interests. In the early stages of development, when user bases and historical data are scarce, it is difficult to conduct meaningful offline and online evaluations. This cold-start evaluation challenge hinders data-driven decision-making for product development and deployment. To address this, we propose LLMEvalRec, a framework that leverages Large Language Model (LLM) agents to simulate user behavior for NRS evaluation. Our approach features generative agents that automatically generate user profiles from a small number of user reading histories and perform realistic actions, while introducing the Guided Episodic Search (GUES) algorithm, which guides the automated prompt optimization process by exploring human prompt engineering practices. Experiments demonstrate that LLMEvalRec-generated data achieves 0.97 Spearman correlation with real evaluation rankings, significantly outperforming baseline simulators (0.4 and -0.05), and successfully predicts relative performance trends across both MIND benchmark and real customer datasets. Production environment validation shows consistent alignment between simulated metrics and real click-through rate (CTR) improvements.

LLMEvalRec: An agentic framework for simulating users to evaluate news recommendation systems

Latest news

Work with us