view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • about 1 month ago • 63
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) Dec 9, 2022 • 192
Running 2.15k 2.15k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
SimpleRL Collection The collection for the Project "Simple Reinforcement Learning for Reasoning" • 2 items • Updated 19 days ago • 5
CodeI/O Collection Collection for CodeI/O @ https://codei-o.github.io/ • 15 items • Updated 25 days ago • 6
NuminaMath Collection Datasets and models for training SOTA math LLMs. See our GitHub for training & inference code: https://github.com/project-numina/aimo-progress-prize • 7 items • Updated 28 days ago • 76