---
base_model: slm-research-vn/Qwen2-7B-Instruct-SPPO-Function-call-v2.1
datasets:
- slm-research-vn/dpo-format-function-calling-v4
- slm-research-vn/dpo-format-glaive-code-assistant-v3-with-mistral-large-slm-iter4
- argilla/dpo-mix-7k
library_name: peft
tags:
- alignment-handbook
- trl
- dpo
- generated_from_trainer
model-index:
- name: Qwen2-7B-Instruct-SPPO-Function-call-v2.4
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Qwen2-7B-Instruct-SPPO-Function-call-v2.4

This model is a fine-tuned version of [slm-research-vn/Qwen2-7B-Instruct-SPPO-Function-call-v2.1](https://huggingface.co./slm-research-vn/Qwen2-7B-Instruct-SPPO-Function-call-v2.1) on the slm-research-vn/dpo-format-function-calling-v4, the slm-research-vn/dpo-format-glaive-code-assistant-v3-with-mistral-large-slm-iter4 and the argilla/dpo-mix-7k datasets.
It achieves the following results on the evaluation set:
- Loss: 0.3152
- Rewards/chosen: 1.9961
- Rewards/rejected: 0.2161
- Rewards/accuracies: 0.8815
- Rewards/margins: 1.7800
- Logps/rejected: -267.1725
- Logps/chosen: -202.5304
- Logits/rejected: -0.6205
- Logits/chosen: -0.6185

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-06
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 8
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:------:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.6534        | 0.1020 | 100  | 0.6139          | 0.2871         | 0.0961           | 0.7572             | 0.1911          | -269.5727      | -236.7095    | -0.6762         | -0.6844       |
| 0.4902        | 0.2041 | 200  | 0.4530          | 1.4421         | 0.5735           | 0.8064             | 0.8685          | -260.0234      | -213.6108    | -0.6513         | -0.6502       |
| 0.391         | 0.3061 | 300  | 0.3935          | 1.9109         | 0.6931           | 0.8382             | 1.2178          | -257.6317      | -204.2344    | -0.6321         | -0.6298       |
| 0.3497        | 0.4082 | 400  | 0.3633          | 1.9715         | 0.5740           | 0.8468             | 1.3975          | -260.0141      | -203.0221    | -0.6323         | -0.6313       |
| 0.3378        | 0.5102 | 500  | 0.3421          | 2.0346         | 0.4602           | 0.8699             | 1.5744          | -262.2907      | -201.7610    | -0.6197         | -0.6103       |
| 0.2904        | 0.6122 | 600  | 0.3287          | 1.9449         | 0.3083           | 0.8757             | 1.6366          | -265.3278      | -203.5543    | -0.6221         | -0.6159       |
| 0.3053        | 0.7143 | 700  | 0.3207          | 1.9933         | 0.2606           | 0.8902             | 1.7327          | -266.2818      | -202.5857    | -0.6162         | -0.6111       |
| 0.2655        | 0.8163 | 800  | 0.3158          | 1.9845         | 0.2262           | 0.8815             | 1.7583          | -266.9698      | -202.7614    | -0.6127         | -0.6026       |
| 0.2943        | 0.9184 | 900  | 0.3144          | 1.9968         | 0.2178           | 0.8844             | 1.7789          | -267.1377      | -202.5171    | -0.6136         | -0.6052       |


### Framework versions

- PEFT 0.12.0
- Transformers 4.44.0
- Pytorch 2.3.1+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1