@di-zhang-fdu on Hugging Face: "LLaMA-O1-PRM and LLaMA-O1-Reinforcement will release in this weekend. We have…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

di-zhang-fdu

posted an update Dec 11, 2024

Post

2596

LLaMA-O1-PRM and LLaMA-O1-Reinforcement will release in this weekend.
We have implemented a novel Reinforcement finetune(RFT) pipeline that taught models learning reasoning and reward labeling without human annotation.

AlexLINB

Dec 11, 2024

Looking forward to it

di-zhang-fdu

Dec 11, 2024

not perfect, but just works:)

Teera

Dec 11, 2024

In this post

di-zhang-fdu Di Zhang
AlexLINB AlexLI
Teera Narak A'