RLHFlow

university

RLHFlow

RLHFlow

AI & ML interests

Workflow of Reinforcement Learning from Human Feedback (RLHF). Blog: https://rlhflow.github.io/

Recent Activity

hendrydong authored a paper 2 days ago

Offline Reinforcement Learning for LLM Multi-Step Reasoning

hendrydong new activity about 1 month ago

RLHFlow/LLaMA3.2-1B-SFT:the training data for this model?

weqweasdas updated a dataset about 2 months ago

RLHFlow/DS-and-Mistral-PRM-Data

View all activity

RLHFlow's activity

hendrydong

authored a paper 2 days ago

Offline Reinforcement Learning for LLM Multi-Step Reasoning

Paper • 2412.16145 • Published 5 days ago • 30

hendrydong

in RLHFlow/LLaMA3.2-1B-SFT about 1 month ago

the training data for this model?

#1 opened about 1 month ago by

weqweasdas

updated 3 datasets about 2 months ago

RLHFlow/DS-and-Mistral-PRM-Data

Viewer • Updated Nov 10 • 526k • 32

RLHFlow/Deepseek-MATH500-Test

Viewer • Updated Nov 9 • 500 • 99

RLHFlow/Mistral-MATH500-Test

Viewer • Updated Nov 9 • 500 • 120

weqweasdas

updated 2 models about 2 months ago

RLHFlow/Llama3.1-8B-PRM-Mistral-Data

Text Generation • Updated Nov 9 • 1.98k • 6

RLHFlow/Llama3.1-8B-PRM-Deepseek-Data

Text Generation • Updated Nov 9 • 5.95k • 24

weqweasdas

updated 4 datasets about 2 months ago

RLHFlow/Deepseek-ORM-Data

Viewer • Updated Nov 9 • 253k • 56 • 1

RLHFlow/Deepseek-PRM-Data

Viewer • Updated Nov 9 • 253k • 47 • 4

RLHFlow/Mistral-ORM-Data

Viewer • Updated Nov 9 • 273k • 100 • 2

RLHFlow/Mistral-PRM-Data

Viewer • Updated Nov 9 • 273k • 137 • 7

weqweasdas

updated 2 models about 2 months ago

RLHFlow/Llama3.1-8B-ORM-Deepseek-Data

Text Generation • Updated Nov 9 • 82

RLHFlow/Llama3.1-8B-ORM-Mistral-Data

Text Generation • Updated Nov 9 • 392

weqweasdas

updated a collection about 2 months ago

RLHFlow MATH Process Reward Model

This is a collection of datasets and models of process reward modeling. • 15 items • Updated Nov 9 • 7

weqweasdas

updated 6 datasets about 2 months ago

RLHFlow/Mistral-MATH500-Test-Result-of-Mistral-PRM

Viewer • Updated Nov 8 • 500 • 34

RLHFlow/Mistral-MATH500-Test-Result-of-Mistral-ORM

Viewer • Updated Nov 8 • 500 • 39

RLHFlow/Mistral-GSM8K-Test-Result-of-Mistral-ORM

Viewer • Updated Nov 8 • 1.32k • 32

RLHFlow/DS-MATH500-Test-Result-of-Mistral-ORM

Viewer • Updated Nov 8 • 500 • 35

RLHFlow/DS-GSM8K-Test-Result-of-Mistral-ORM

Viewer • Updated Nov 8 • 1.32k • 32

RLHFlow/DS-GSM8K-Test-Result-of-DS-ORM

Viewer • Updated Nov 8 • 1.32k • 32