Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
di-zhang-fdu 
posted an update Dec 11, 2024
Post
2596
LLaMA-O1-PRM and LLaMA-O1-Reinforcement will release in this weekend.
We have implemented a novel Reinforcement finetune(RFT) pipeline that taught models learning reasoning and reward labeling without human annotation.

Looking forward to it

·

not perfect, but just works:)

.