@qq8933 on Hugging Face: "LLaMA-O1: Open Large Reasoning Model Frameworks For Training, Inference and…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

qq8933

posted an update Nov 3

Post

6258

LLaMA-O1: Open Large Reasoning Model Frameworks For Training, Inference and Evaluation With PyTorch and HuggingFace
Large Reasoning Models powered by Monte Carlo Tree Search (MCTS), Self-Play Reinforcement Learning, PPO, AlphaGo Zero's dua policy paradigm and Large Language Models!
https://github.com/SimpleBerry/LLaMA-O1/

What will happen when you compound MCTS ❤ LLM ❤ Self-Play ❤RLHF?
Just a little bite of strawberry!🍓

Past related works:
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning (2410.02884)
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B (2406.07394)

Nov 4

Awesome work. Can we finetune further this reasoning model?

·

qq8933

Nov 5

main.py is the entry for finetune, but codes need further improvements, see 'Call for contributors'

In this post