LLMs for customized marketing content generation and evaluation at scale
2025
Offsite marketing is essential in e-commerce, enabling businesses to reach customers through external platforms and drive traffic to retail websites. However, most current offsite marketing content is overly generic, template-based, and poorly aligned with landing pages, limiting its effectiveness. To address these limitations, we propose MarketingFM, a retrieval-augmented marketing content generation system that integrates multiple data sources to produce keyword-specific ad copy with minimal human intervention. We validate MarketingFM through offline human and automated evaluations and large-scale online A/B tests. In a recent experiment, keyword-focused ad copy outperformed template-based ads, achieving up to 9% higher click-through rates (CTR), 12% more impressions, and a 0.38% lower cost-per-click (CPC), demonstrating improved ad ranking and cost efficiency. Despite these gains, manual review of generated ads remains costly and time-consuming. To mitigate this, we introduce AutoEval-Main, an automated evaluation system that combines rule-based metrics and LLM-as-a-Judge approaches, ensuring alignment with marketing principles. In experiments conducted with large-scale high-quality human annotation data, AutoEval-Main achieves an agreement rate of 89.57% with human reviewers. Building on this, we further propose AutoEvalUpdate, a cost-efficient LLM-human collaborative framework designed to dynamically refine the evaluation prompt and adapt to shifting criteria with minimal human effort. By selectively sampling representative ads for human review and leveraging a critic LLM to generate alignment reports, AutoEval-Update enhances evaluation consistency while significantly reducing manual effort. Our experiments show that the critic LLM proposes meaningful refinements, enhancing alignment between LLM-based and human evaluations. However, human oversight remains crucial for setting thresholds and validating refinements before full-scale deployment.
Research areas