PEFT
Safetensors
qwen2
alignment-handbook
trl
dpo
Generated from Trainer
Edit model card

Qwen2-7B-Instruct-SPPO-Function-call-v2.4

This model is a fine-tuned version of slm-research-vn/Qwen2-7B-Instruct-SPPO-Function-call-v2.1 on the slm-research-vn/dpo-format-function-calling-v4, the slm-research-vn/dpo-format-glaive-code-assistant-v3-with-mistral-large-slm-iter4 and the argilla/dpo-mix-7k datasets. It achieves the following results on the evaluation set:

  • Loss: 0.3152
  • Rewards/chosen: 1.9961
  • Rewards/rejected: 0.2161
  • Rewards/accuracies: 0.8815
  • Rewards/margins: 1.7800
  • Logps/rejected: -267.1725
  • Logps/chosen: -202.5304
  • Logits/rejected: -0.6205
  • Logits/chosen: -0.6185

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6534 0.1020 100 0.6139 0.2871 0.0961 0.7572 0.1911 -269.5727 -236.7095 -0.6762 -0.6844
0.4902 0.2041 200 0.4530 1.4421 0.5735 0.8064 0.8685 -260.0234 -213.6108 -0.6513 -0.6502
0.391 0.3061 300 0.3935 1.9109 0.6931 0.8382 1.2178 -257.6317 -204.2344 -0.6321 -0.6298
0.3497 0.4082 400 0.3633 1.9715 0.5740 0.8468 1.3975 -260.0141 -203.0221 -0.6323 -0.6313
0.3378 0.5102 500 0.3421 2.0346 0.4602 0.8699 1.5744 -262.2907 -201.7610 -0.6197 -0.6103
0.2904 0.6122 600 0.3287 1.9449 0.3083 0.8757 1.6366 -265.3278 -203.5543 -0.6221 -0.6159
0.3053 0.7143 700 0.3207 1.9933 0.2606 0.8902 1.7327 -266.2818 -202.5857 -0.6162 -0.6111
0.2655 0.8163 800 0.3158 1.9845 0.2262 0.8815 1.7583 -266.9698 -202.7614 -0.6127 -0.6026
0.2943 0.9184 900 0.3144 1.9968 0.2178 0.8844 1.7789 -267.1377 -202.5171 -0.6136 -0.6052

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.0
  • Pytorch 2.3.1+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
3
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for khongtrunght/Qwen2-7B-Instruct-SPPO-Function-call-v2.4

Dataset used to train khongtrunght/Qwen2-7B-Instruct-SPPO-Function-call-v2.4