LLMs for customized marketing content generation and evaluation at scale

Haoran Liu; Amir Tahmasbi; Ehtesham Sam haque; Purak Jain

Publication

LLMs for customized marketing content generation and evaluation at scale

By Haoran Liu, Amir Tahmasbi, Ehtesham Sam haque, Purak Jain

2025

Download Copy BibTeX

Share

Download

Copy BibTeX

Share

Offsite marketing is essential in e-commerce, enabling businesses to reach customers through external platforms and drive traffic to retail websites. However, most current offsite marketing content is overly generic, template-based, and poorly aligned with landing pages, limiting its effectiveness. To address these limitations, we propose MarketingFM, a retrieval-augmented marketing content generation system that integrates multiple data sources to produce keyword-specific ad copy with minimal human intervention. We validate MarketingFM through offline human and automated evaluations and large-scale online A/B tests. In a recent experiment, keyword-focused ad copy outperformed template-based ads, achieving up to 9% higher click-through rates (CTR), 12% more impressions, and a 0.38% lower cost-per-click (CPC), demonstrating improved ad ranking and cost efficiency. Despite these gains, manual review of generated ads remains costly and time-consuming. To mitigate this, we introduce AutoEval-Main, an automated evaluation system that combines rule-based metrics and LLM-as-a-Judge approaches, ensuring alignment with marketing principles. In experiments conducted with large-scale high-quality human annotation data, AutoEval-Main achieves an agreement rate of 89.57% with human reviewers. Building on this, we further propose AutoEvalUpdate, a cost-efficient LLM-human collaborative framework designed to dynamically refine the evaluation prompt and adapt to shifting criteria with minimal human effort. By selectively sampling representative ads for human review and leveraging a critic LLM to generate alignment reports, AutoEval-Update enhances evaluation consistency while significantly reducing manual effort. Our experiments show that the critic LLM proposes meaningful refinements, enhancing alignment between LLM-based and human evaluations. However, human oversight remains crucial for setting thresholds and validating refinements before full-scale deployment.

LLMs for customized marketing content generation and evaluation at scale

Latest news

Work with us