Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
RLHFlow
's Collections
RLHFlow MATH Process Reward Model
Standard-format-preference-dataset
Mixture-of-preference-reward-modeling
RM-Bradley-Terry
PM-pair
Online RLHF
RLHFLow Reward Models
SFT Models
SFT Models
updated
Nov 3
We train a series of SFT models on the high-quality SFT dataset of RLHFlow for research purpose.
Upvote
1
RLHFlow/LLaMA3-SFT
Text Generation
•
Updated
Nov 3
•
5.75k
•
8
RLHFlow/RLHFlow-SFT-Dataset-ver2
Viewer
•
Updated
Nov 2
•
2.32M
•
77
•
4
RLHFlow/LLaMA3-SFT-v2
Text Generation
•
Updated
Nov 3
•
2.02k
RLHFlow/Llama3-SFT-v2.0-epoch1
Text Generation
•
Updated
Nov 3
•
11
RLHFlow/Llama3-SFT-v2.0-epoch2
Text Generation
•
Updated
Nov 3
•
11
RLHFlow/Llama3-SFT-v2.0-epoch3
Text Generation
•
Updated
Nov 3
•
546
Upvote
1
Share collection
View history
Collection guide
Browse collections