RLHFlow
/

Llama3-SFT-v2.0-epoch2

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

This is the SFT checkpoint used for the project RLHFlow/Online-RLHF

Paper: RLHF Workflow: From Reward Modeling to Online RLHF (Published in TMLR, 2024)
Authors: Hanze Dong*, Wei Xiong*, Bo Pang*, Haoxiang Wang*, Han Zhao, Yingbo Zhou, Nan Jiang, Doyen Sahoo, Caiming Xiong, Tong Zhang
Code: https://github.com/RLHFlow/Online-RLHF

The model is trained from meta-llama/Meta-Llama-3-8B on RLHFlow/RLHFlow-SFT-Dataset-ver2 for 2 epochs. We use a global batch size of 128 and a learning rate of 2e-5, where we pack the samples and split them into chunks of 8192 token. See more training details at https://github.com/RLHFlow/Online-RLHF/blob/main/sft/llama3-8b-it.yaml .

Citation

Please cite our techical report if you find our model is useful for your research or product.

@misc{dong2024rlhf,
      title={RLHF Workflow: From Reward Modeling to Online RLHF}, 
      author={Hanze Dong and Wei Xiong and Bo Pang and Haoxiang Wang and Han Zhao and Yingbo Zhou and Nan Jiang and Doyen Sahoo and Caiming Xiong and Tong Zhang},
      year={2024},
      eprint={2405.07863},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Downloads last month: 11

Safetensors

Model size

8.03B params

Tensor type

BF16

·

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including RLHFlow/Llama3-SFT-v2.0-epoch2

SFT Models

We train a series of SFT models on the high-quality SFT dataset of RLHFlow for research purpose. • 6 items • Updated Nov 3 • 1