--- base_model: slm-research-vn/Qwen2-7B-Instruct-SPPO-Function-call-v2.1 datasets: - slm-research-vn/dpo-format-function-calling-v4 - slm-research-vn/dpo-format-glaive-code-assistant-v3-with-mistral-large-slm-iter4 - argilla/dpo-mix-7k library_name: peft tags: - alignment-handbook - trl - dpo - generated_from_trainer model-index: - name: Qwen2-7B-Instruct-SPPO-Function-call-v2.4 results: [] --- # Qwen2-7B-Instruct-SPPO-Function-call-v2.4 This model is a fine-tuned version of [slm-research-vn/Qwen2-7B-Instruct-SPPO-Function-call-v2.1](https://huggingface.co./slm-research-vn/Qwen2-7B-Instruct-SPPO-Function-call-v2.1) on the slm-research-vn/dpo-format-function-calling-v4, the slm-research-vn/dpo-format-glaive-code-assistant-v3-with-mistral-large-slm-iter4 and the argilla/dpo-mix-7k datasets. It achieves the following results on the evaluation set: - Loss: 0.3152 - Rewards/chosen: 1.9961 - Rewards/rejected: 0.2161 - Rewards/accuracies: 0.8815 - Rewards/margins: 1.7800 - Logps/rejected: -267.1725 - Logps/chosen: -202.5304 - Logits/rejected: -0.6205 - Logits/chosen: -0.6185 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-06 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - gradient_accumulation_steps: 4 - total_train_batch_size: 32 - total_eval_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.6534 | 0.1020 | 100 | 0.6139 | 0.2871 | 0.0961 | 0.7572 | 0.1911 | -269.5727 | -236.7095 | -0.6762 | -0.6844 | | 0.4902 | 0.2041 | 200 | 0.4530 | 1.4421 | 0.5735 | 0.8064 | 0.8685 | -260.0234 | -213.6108 | -0.6513 | -0.6502 | | 0.391 | 0.3061 | 300 | 0.3935 | 1.9109 | 0.6931 | 0.8382 | 1.2178 | -257.6317 | -204.2344 | -0.6321 | -0.6298 | | 0.3497 | 0.4082 | 400 | 0.3633 | 1.9715 | 0.5740 | 0.8468 | 1.3975 | -260.0141 | -203.0221 | -0.6323 | -0.6313 | | 0.3378 | 0.5102 | 500 | 0.3421 | 2.0346 | 0.4602 | 0.8699 | 1.5744 | -262.2907 | -201.7610 | -0.6197 | -0.6103 | | 0.2904 | 0.6122 | 600 | 0.3287 | 1.9449 | 0.3083 | 0.8757 | 1.6366 | -265.3278 | -203.5543 | -0.6221 | -0.6159 | | 0.3053 | 0.7143 | 700 | 0.3207 | 1.9933 | 0.2606 | 0.8902 | 1.7327 | -266.2818 | -202.5857 | -0.6162 | -0.6111 | | 0.2655 | 0.8163 | 800 | 0.3158 | 1.9845 | 0.2262 | 0.8815 | 1.7583 | -266.9698 | -202.7614 | -0.6127 | -0.6026 | | 0.2943 | 0.9184 | 900 | 0.3144 | 1.9968 | 0.2178 | 0.8844 | 1.7789 | -267.1377 | -202.5171 | -0.6136 | -0.6052 | ### Framework versions - PEFT 0.12.0 - Transformers 4.44.0 - Pytorch 2.3.1+cu121 - Datasets 2.20.0 - Tokenizers 0.19.1