Edit model card

Mistral-7B-v0.1-dpo-10k

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.7523
  • Rewards/real: 2.2447
  • Rewards/generated: 1.4806
  • Rewards/accuracies: 0.6154
  • Rewards/margins: 0.7641
  • Logps/generated: -106.5099
  • Logps/real: -116.4675
  • Logits/generated: -2.3563
  • Logits/real: -2.3976

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/real Rewards/generated Rewards/accuracies Rewards/margins Logps/generated Logps/real Logits/generated Logits/real
0.74 0.1984 62 0.7414 1.1355 0.8829 0.6154 0.2526 -112.4863 -127.5589 -2.4229 -2.4711
0.7524 0.3968 124 0.7002 1.7305 1.2540 0.6923 0.4765 -108.7756 -121.6096 -2.5561 -2.5864
0.8028 0.5952 186 0.7025 1.7197 1.2525 0.6538 0.4673 -108.7909 -121.7167 -2.4102 -2.3984
0.7502 0.7936 248 0.7088 1.5388 0.9514 0.6346 0.5875 -111.8017 -123.5257 -2.5032 -2.5135
0.8621 0.992 310 0.7444 1.5171 1.1213 0.6731 0.3957 -110.1023 -123.7435 -2.4965 -2.5022
0.3246 1.1904 372 0.7215 2.3223 1.7036 0.6731 0.6187 -104.2799 -115.6916 -2.5671 -2.5848
0.3153 1.3888 434 0.7150 2.3474 1.7021 0.6538 0.6453 -104.2945 -115.4398 -2.4999 -2.5255
0.4053 1.5872 496 0.7083 2.2991 1.6619 0.6731 0.6372 -104.6970 -115.9233 -2.4039 -2.4069
0.3611 1.7856 558 0.7119 2.3331 1.7045 0.6731 0.6286 -104.2702 -115.5829 -2.4323 -2.4364
0.3933 1.984 620 0.7168 2.3292 1.7024 0.6731 0.6268 -104.2917 -115.6223 -2.4321 -2.4267
0.226 2.1824 682 0.7430 2.2194 1.4536 0.6346 0.7658 -106.7797 -116.7200 -2.3994 -2.4211
0.2117 2.3808 744 0.7449 2.1435 1.3976 0.5962 0.7459 -107.3397 -117.4795 -2.4077 -2.4527
0.2304 2.5792 806 0.7553 2.2242 1.4834 0.5769 0.7408 -106.4812 -116.6720 -2.3411 -2.3926
0.2423 2.7776 868 0.7526 2.2896 1.5597 0.5962 0.7299 -105.7187 -116.0179 -2.3574 -2.3974
0.2881 2.976 930 0.7523 2.2447 1.4806 0.6154 0.7641 -106.5099 -116.4675 -2.3563 -2.3976

Framework versions

  • Transformers 4.43.3
  • Pytorch 2.2.2+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
11
Safetensors
Model size
7.24B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for AmberYifan/Mistral-7B-v0.1-dpo-10k

Finetuned
(685)
this model