Trustworthiness-as-reward: Improving LLM performance on text classification through reinforcement learning
2025
Text classification has become increasingly important with the exponential growth of digital text data, finding applications in sentiment analysis, spam detection, topic categorization, and content moderation across various domains. Our research introduced a novel approach that integrates reinforcement learning with a specialized reasoning path. This methodology enabled smaller 7B parameter language models to increase performance significantly to the level comparable to larger models e.g. Claude 3.7, on an open source Pubmed multilabel text classification task. We experimented with 1) Claude 3.7 and DeepSeek-R1-Distill-Qwen-7B (Qwen-7B) zero shot, 2) Supervised Fine-Tuned (SFT) Qwen-7B, 3) Reinforcement Learning (RL) Qwen-7B and 4) SFT + RL Qwen-7B. We also experimented with different reasoning paths: 1) no reasoning, and 2) Socratic reasoning, as well as different evaluation metrics as reward: 1) F1 score as reward, 2) Trustworthiness (or reasoning process accuracy) as reward. The training data are composed of ~11,000 pubmed publication abstracts. We evaluated the performance in another ~1,000 abstract. SFT + RL Qwen-7B with Socratic reasoning and F1 score as reward achieved the highest F1 score of 0.8348. In summary, we proposed an innovative post-training paradigm integrating SFT, RL, Socratic reasoning path, and Trustworthiness-as-Reward. With this paradigm, we were able to double the F1 score compared to the base 7B model and achieved a ~ 0.15 lift in F1 score compared to using SFT alone without reasoning. Our pipeline demonstrates that strategic optimization of smaller models can achieve superior results compared to simply scaling up the model size.
Research areas