Pearl: A Production-ready Reinforcement Learning Agent Paper • 2312.03814 • Published Dec 6, 2023 • 14
Secrets of RLHF in Large Language Models Part II: Reward Modeling Paper • 2401.06080 • Published Jan 11 • 26
Contrastive Prefence Learning: Learning from Human Feedback without RL Paper • 2310.13639 • Published Oct 20, 2023 • 24
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback Paper • 2402.01391 • Published Feb 2 • 41
IRCoCo: Immediate Rewards-Guided Deep Reinforcement Learning for Code Completion Paper • 2401.16637 • Published Jan 30 • 1