OLMo-1B-0724 SFT

OLMo-1B-0724-hf finetuned for 5 epochs with a learning rate of 1e-5 on the Tulu 2 dataset - specifically this version. I used a batch size of 1, 128 grad accumulation steps. Linear warmup for the first 3% of training then linear decay to 0.

I've additionally released an 'instruct' version which has additionally gone through DPO training. This model is generally more performant (see the metrics below), so check it out!

Evals are as follows:

Metric OLMo-1B-0724-hf OLMo-1B-0724-SFT-hf (this model!) OLMo-1B-0724-Instruct-hf
MMLU 0-shot 25.0 36.0 36.7
GSM8k CoT 8-shot 7.0 12.5 12.5
BBH CoT 3-shot 22.5 27.2 30.6
HumanEval P@10 16.0 21.2 22.0
AlpacaEval 1 - 41.5 50.9
AlpacaEval 2 LC - 2.7 2.5
Toxigen % Toxic 80.3 59.7 14.1
TruthfulQA %Info+True 23.0 40.9 42.2
IFEval Loose Acc 20.5 26.1 24.2
XSTest F1 67.6 81.9 79.8
Average of above metrics 25.2 33.0 38.7

Model training and evaluation was performed using Open-instruct, so check that out for more details on evaluation.

Downloads last month
7
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train hamishivi/OLMo-1B-0724-SFT-hf