view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • about 1 month ago • 63
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) Dec 9, 2022 • 192
SimpleRL Collection The collection for the Project "Simple Reinforcement Learning for Reasoning" • 2 items • Updated 19 days ago • 5
CodeI/O Collection Collection for CodeI/O @ https://codei-o.github.io/ • 15 items • Updated 25 days ago • 6
NuminaMath Collection Datasets and models for training SOTA math LLMs. See our GitHub for training & inference code: https://github.com/project-numina/aimo-progress-prize • 7 items • Updated 28 days ago • 76
Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2 Paper • 2502.03544 • Published Feb 5 • 43
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published about 1 month ago • 122
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning Paper • 2502.06781 • Published 27 days ago • 60
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published 27 days ago • 142
OpenR1-Math Collection Dataset and SFT model distilled from DeepSeek-R1. Check out our blog post for more details: https://huggingface.co./blog/open-r1/update-2 • 3 items • Updated 23 days ago • 7
🧠 Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 14 items • Updated 1 day ago • 91