167c575a-5c70-46aa-a2b2-0e7569616426

This model is a fine-tuned version of NousResearch/Meta-Llama-3-8B-Alternate-Tokenizer on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.000201
train_batch_size: 4
eval_batch_size: 4
seed: 10
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 50
training_steps: 500

Training Loss	Epoch	Step	Validation Loss
No log	0.0005	1	2.7530
2.7993	0.0248	50	2.6631
2.6363	0.0495	100	2.6600
2.7406	0.0743	150	2.7099
2.6545	0.0990	200	2.5883
2.6782	0.1238	250	2.5581
2.6232	0.1485	300	2.5457
2.7885	0.1733	350	2.4967
2.6768	0.1980	400	2.4765
2.7833	0.2228	450	2.4695
2.6698	0.2475	500	2.4683