Edit model card

Llama-2-7b-hf-DPO-LookAhead3_FullEval_TTree1.4_TLoop0.7_TEval0.2_Filter0.2_V2.0

This model is a fine-tuned version of meta-llama/Llama-2-7b-hf on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6353
  • Rewards/chosen: -3.2199
  • Rewards/rejected: -3.7792
  • Rewards/accuracies: 0.625
  • Rewards/margins: 0.5593
  • Logps/rejected: -145.3183
  • Logps/chosen: -164.8658
  • Logits/rejected: -1.1220
  • Logits/chosen: -1.0854

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6639 0.3037 53 0.6613 0.0051 -0.0607 0.875 0.0658 -108.1335 -132.6158 -0.6137 -0.5723
0.6476 0.6074 106 0.6171 -0.2010 -0.3748 0.625 0.1738 -111.2741 -134.6767 -0.6554 -0.6155
0.6552 0.9112 159 0.6850 -0.4026 -0.4336 0.5 0.0310 -111.8621 -136.6923 -0.6025 -0.5605
0.271 1.2149 212 0.5592 -1.1775 -1.5117 0.75 0.3342 -122.6435 -144.4414 -0.6651 -0.6240
0.2321 1.5186 265 0.6523 -1.6722 -1.8791 0.5 0.2069 -126.3177 -149.3886 -0.7461 -0.7056
0.3961 1.8223 318 0.5176 -1.1964 -1.6762 0.875 0.4798 -124.2882 -144.6302 -0.8107 -0.7719
0.1421 2.1261 371 0.6029 -2.4068 -2.8869 0.625 0.4801 -136.3952 -156.7344 -1.0103 -0.9720
0.5702 2.4298 424 0.6557 -3.1785 -3.6978 0.625 0.5193 -144.5047 -164.4516 -1.0897 -1.0539
0.2376 2.7335 477 0.6353 -3.2199 -3.7792 0.625 0.5593 -145.3183 -164.8658 -1.1220 -1.0854

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.2
  • Pytorch 2.4.0+cu121
  • Datasets 3.0.0
  • Tokenizers 0.19.1
Downloads last month
39
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for LBK95/Llama-2-7b-hf-DPO-LookAhead3_FullEval_TTree1.4_TLoop0.7_TEval0.2_Filter0.2_V2.0

Adapter
(1014)
this model