Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published 24 days ago • 40
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19, 2024 • 139
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Paper • 2408.03314 • Published Aug 6, 2024 • 60
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning Paper • 2406.11896 • Published Jun 14, 2024 • 20
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL Paper • 2403.03950 • Published Mar 6, 2024 • 16
Robotic Offline RL from Internet Videos via Value-Function Pre-Training Paper • 2309.13041 • Published Sep 22, 2023 • 8
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions Paper • 2309.10150 • Published Sep 18, 2023 • 25