Raja Biswas's picture

Raja Biswas

rbiswasfc

·

AI & ML interests

NLP, Generative AI

Recent Activity

updated a dataset 6 days ago

rbiswasfc/r1-7b

published a dataset 6 days ago

rbiswasfc/r1-7b

upvoted an article 15 days ago

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

View all activity

Organizations

rbiswasfc's activity

upvoted 2 articles 15 days ago

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

By

•

about 1 month ago

• 63

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Dec 9, 2022

• 192

upvoted 2 collections 19 days ago

SimpleRL

The collection for the Project "Simple Reinforcement Learning for Reasoning" • 2 items • Updated 19 days ago • 5

CodeI/O

Collection for CodeI/O @ https://codei-o.github.io/ • 15 items • Updated 25 days ago • 6

upvoted a paper 22 days ago

Learn Your Reference Model for Real Good Alignment

Paper • 2404.09656 • Published Apr 15, 2024 • 84

upvoted an article 22 days ago

Article

How NuminaMath Won the 1st AIMO Progress Prize

Jul 11, 2024

• 118

upvoted a collection 22 days ago

NuminaMath

Datasets and models for training SOTA math LLMs. See our GitHub for training & inference code: https://github.com/project-numina/aimo-progress-prize • 7 items • Updated 28 days ago • 76

upvoted an article 25 days ago

Article

1 Billion Classifications

25 days ago

• 42

upvoted 4 papers 27 days ago

Gold-medalist Performance in Solving Olympiad Geometry with AlphaGeometry2

Paper • 2502.03544 • Published Feb 5 • 43

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published about 1 month ago • 122

Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning

Paper • 2502.06781 • Published 27 days ago • 60

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published 27 days ago • 142

upvoted 2 collections 27 days ago

OpenR1-Math

Dataset and SFT model distilled from DeepSeek-R1. Check out our blog post for more details: https://huggingface.co./blog/open-r1/update-2 • 3 items • Updated 23 days ago • 7

🧠 Reasoning datasets

Datasets with reasoning traces for math and code released by the community • 14 items • Updated 1 day ago • 91

upvoted a paper 27 days ago

The Curse of Depth in Large Language Models

Paper • 2502.05795 • Published 29 days ago • 35

upvoted an article 27 days ago

Article

Open R1: Update #2

By

and 6 others •

27 days ago

• 197

upvoted a paper 28 days ago

On Teacher Hacking in Language Model Distillation

Paper • 2502.02671 • Published Feb 4 • 18

upvoted an article 28 days ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28

• 795

upvoted 2 papers 28 days ago

Demystifying Long Chain-of-Thought Reasoning in LLMs

Paper • 2502.03373 • Published Feb 5 • 55

LIMO: Less is More for Reasoning

Paper • 2502.03387 • Published Feb 5 • 57