Reviewer2: Optimizing Review Generation Through Prompt Generation Paper • 2402.10886 • Published Feb 16
REBEL: Reinforcement Learning via Regressing Relative Rewards Paper • 2404.16767 • Published Apr 25 • 2
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF Paper • 2410.04612 • Published Oct 6