LLaMA-3.2-1B-Instruct Post-training by GRPO from DeepSeek
This model is a post-trained version of LLaMA-3.2-1B-Instruct.
Model Details
- Base Model: LLaMA-3.2-1B
- Training Data: openai/gsm8k
- Post-training Steps: 1000
- Checkpoint:
checkpoint-1000/
- Framework: Hugging Face
transformers
- Usage: Mathematical Reasoning.
How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "accuracy-maker/Llama-3.2-1B-GRPO-gsm8k"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
input_text = "What is the capital of France?"
generate_with_stream(input_text)
- Downloads last month
- 30
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model has no library tag.