mistral-sft-dpo-v
This model is a fine-tuned version of AmberYifan/mistral-safe-sft-full on the AmberYifan/dpo-v dataset. It achieves the following results on the evaluation set:
- Loss: 0.5826
- Rewards/chosen: -1.5259
- Rewards/rejected: -2.2192
- Rewards/accuracies: 0.6943
- Rewards/margins: 0.6934
- Logps/rejected: -114.9667
- Logps/chosen: -126.3336
- Logits/rejected: -2.7268
- Logits/chosen: -2.7376
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 32
- total_eval_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Logits/chosen | Logits/rejected | Logps/chosen | Logps/rejected | Validation Loss | Rewards/accuracies | Rewards/chosen | Rewards/margins | Rewards/rejected |
---|---|---|---|---|---|---|---|---|---|---|---|
0.6556 | 0.1280 | 200 | -2.8220 | -2.8085 | -114.5322 | -98.7538 | 0.6150 | 0.6831 | -0.3457 | 0.2522 | -0.5979 |
0.6305 | 0.2559 | 400 | -2.8321 | -2.8122 | -117.8738 | -104.5036 | 0.5884 | 0.6959 | -0.6799 | 0.4930 | -1.1729 |
0.6374 | 0.3839 | 600 | 0.5879 | -1.1220 | -1.7271 | 0.6799 | 0.6051 | -110.0456 | -122.2947 | -2.7503 | -2.7680 |
0.5953 | 0.5118 | 800 | 0.5857 | -1.1997 | -1.8000 | 0.6959 | 0.6004 | -110.7746 | -123.0715 | -2.7540 | -2.7703 |
0.5874 | 0.6398 | 1000 | 0.5864 | -1.2587 | -1.8977 | 0.6919 | 0.6390 | -111.7514 | -123.6620 | -2.7387 | -2.7547 |
0.5937 | 0.7678 | 1200 | 0.5853 | -1.4590 | -2.1314 | 0.6943 | 0.6724 | -114.0883 | -125.6648 | -2.7109 | -2.7243 |
0.6276 | 0.8957 | 1400 | 0.5845 | -1.4589 | -2.1361 | 0.6998 | 0.6771 | -114.1350 | -125.6641 | -2.7129 | -2.7248 |
Framework versions
- Transformers 4.43.3
- Pytorch 2.2.2+cu121
- Datasets 2.20.0
- Tokenizers 0.19.1
- Downloads last month
- 9
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.
Model tree for AmberYifan/mistral-sft-dpo-v
Base model
mistralai/Mistral-7B-v0.1
Finetuned
AmberYifan/mistral-safe-sft-full