HF CMU Collab

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

edbeeching authored a paper about 15 hours ago

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

lewtun authored a paper about 15 hours ago

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

CohenQu authored a paper about 21 hours ago

Recursive Introspection: Teaching Language Model Agents How to Self-Improve

View all activity

hf-cmu-collab's activity

edbeeching

authored a paper about 15 hours ago

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Paper • 2503.07572 • Published 3 days ago • 25

lewtun

authored a paper about 15 hours ago

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Paper • 2503.07572 • Published 3 days ago • 25

CohenQu

authored a paper about 21 hours ago

Recursive Introspection: Teaching Language Model Agents How to Self-Improve

Paper • 2407.18219 • Published Jul 25, 2024 • 3

CohenQu

authored 2 papers about 22 hours ago

Guided Data Augmentation for Offline Reinforcement Learning and Imitation Learning

Paper • 2310.18247 • Published Oct 27, 2023

Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning

Paper • 2503.07572 • Published 3 days ago • 25

lewtun

posted an update 1 day ago

Post

1415

Introducing OlympicCoder: a series of open reasoning models that can solve olympiad-level programming problems 🧑‍💻

- 7B open-r1/OlympicCoder-7B
- 32B open-r1/OlympicCoder-32B

We find that OlympicCoder models outperform Claude 3.7 Sonnet, as well as others over 100x larger 💪

Together with the models, we are releasing:

📊CodeForces-CoTs: new dataset of code problems from the most popular competitive coding platform, with R1 traces in C++ and Python open-r1/codeforces-cots

🏆 IOI'2024: a new benchmark of VERY hard programming problems where even frontier models struggle to match human performance open-r1/ioi

For links to the models and datasets, check out our latest progress report from Open R1: https://huggingface.co./blog/open-r1/update-3

1 reply

aviralku

authored 13 papers 30 days ago

Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models

Paper • 2310.10639 • Published Oct 16, 2023 • 3

Vision-Language Models Provide Promptable Representations for Reinforcement Learning

Paper • 2402.02651 • Published Feb 5, 2024

RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold

Paper • 2406.14532 • Published Jun 20, 2024

Recursive Introspection: Teaching Language Model Agents How to Self-Improve

Paper • 2407.18219 • Published Jul 25, 2024 • 3

Generative Verifiers: Reward Modeling as Next-Token Prediction

Paper • 2408.15240 • Published Aug 27, 2024 • 13

Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning

Paper • 2410.08146 • Published Oct 10, 2024

Steering Your Generalists: Improving Robotic Foundation Models via Value Guidance

Paper • 2410.13816 • Published Oct 17, 2024 • 2

Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data

Paper • 2412.07762 • Published Dec 10, 2024

Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models

Paper • 2412.15287 • Published Dec 18, 2024

Value-Based Deep RL Scales Predictably

Paper • 2502.04327 • Published Feb 6 • 6

lewtun

posted an update about 1 month ago

Post

4857

Introducing OpenR1-Math-220k!

open-r1/OpenR1-Math-220k

The community has been busy distilling DeepSeek-R1 from inference providers, but we decided to have a go at doing it ourselves from scratch 💪

What’s new compared to existing reasoning datasets?

♾ Based on AI-MO/NuminaMath-1.5: we focus on math reasoning traces and generate answers for problems in NuminaMath 1.5, an improved version of the popular NuminaMath-CoT dataset.

🐳 800k R1 reasoning traces: We generate two answers for 400k problems using DeepSeek R1. The filtered dataset contains 220k problems with correct reasoning traces.

📀 512 H100s running locally: Instead of relying on an API, we leverage vLLM and SGLang to run generations locally on our science cluster, generating 180k reasoning traces per day.

⏳ Automated filtering: We apply Math Verify to only retain problems with at least one correct answer. We also leverage Llama3.3-70B-Instruct as a judge to retrieve more correct examples (e.g for cases with malformed answers that can’t be verified with a rules-based parser)

📊 We match the performance of DeepSeek-Distill-Qwen-7B by finetuning Qwen-7B-Math-Instruct on our dataset.

🔎 Read our blog post for all the nitty gritty details: https://huggingface.co./blog/open-r1/update-2

AI & ML interests

Recent Activity

Team members 7

hf-cmu-collab's activity