YuyaoGe's Website
YuyaoGe's Website
About
Highlight Publications
Other Publications
Projects
Posts
Light
Dark
Automatic
Home
Tags
Reinforcement Learning
Reinforcement Learning
Reward and Guidance through Rubrics: Promoting Exploration to Improve Multi-Domain Reasoning
Abstract: Reinforcement learning (RL) has shown great promise in enhancing LLM reasoning, but current approaches mainly focus on single domains with verifiable rewards. We propose RGR-GRPO, a rubric-driven RL framework for multi-domain reasoning that uses rubrics to provide fine-grained reward signals and offline guidance.
Baolong Bi
,
Shenghua Liu
,
Yiwei Wang
,
Siqian Tong
,
Lingrui Mei
,
Yuyao Ge 葛钰峣
,
Yilong Xu
,
Jiafeng Guo
,
Xueqi Cheng
Nov 15, 2025
Cite
PDF
arXiv
Cite
×