Reinforcement Learning

Reward and Guidance through Rubrics: Promoting Exploration to Improve Multi-Domain Reasoning

Abstract: Reinforcement learning (RL) has shown great promise in enhancing LLM reasoning, but current approaches mainly focus on single domains with verifiable rewards. We propose RGR-GRPO, a rubric-driven RL framework for multi-domain reasoning that uses rubrics to provide fine-grained reward signals and offline guidance.

Baolong Bi, Shenghua Liu, Yiwei Wang, Siqian Tong, Lingrui Mei, Yuyao Ge 葛钰峣, Yilong Xu, Jiafeng Guo, Xueqi Cheng

Nov 15, 2025

Reward and Guidance through Rubrics: Promoting Exploration to Improve Multi-Domain Reasoning