sft_dpo_fs

This model is a fine-tuned version of mistralai/Mistral-Nemo-Instruct-2407 on the heat_transfer_dpo dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1535
  • Rewards/chosen: 17.2823
  • Rewards/rejected: 11.3004
  • Rewards/accuracies: 0.9610
  • Rewards/margins: 5.9819
  • Logps/chosen: -2.2063
  • Logps/rejected: -60.6033
  • Logits/chosen: 0.0035
  • Logits/rejected: -0.0076

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • total_train_batch_size: 8
  • total_eval_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/chosen Logps/rejected Logits/chosen Logits/rejected
0.3835 0.0533 60 0.3287 17.1482 15.7095 0.9280 1.4386 -3.5472 -16.5118 -0.5827 -0.5914
0.2552 0.1067 120 0.1900 17.1335 13.7535 0.9320 3.3799 -3.6944 -36.0722 -0.2065 -0.2218
0.2362 0.16 180 0.2024 17.0614 11.9722 0.9510 5.0892 -4.4150 -53.8850 -0.1087 -0.1222
0.1781 0.2133 240 0.1546 17.0620 12.2862 0.9500 4.7758 -4.4089 -50.7448 -0.1243 -0.1381
0.265 0.2667 300 0.1536 17.2493 12.6444 0.9440 4.6050 -2.5355 -47.1637 -0.1744 -0.1856
0.1605 0.32 360 0.3194 17.3612 12.2655 0.9210 5.0958 -1.4165 -50.9525 -0.1062 -0.1173
0.2894 0.3733 420 0.1679 17.3116 12.2496 0.9450 5.0620 -1.9131 -51.1113 -0.0905 -0.1026
0.1149 0.4267 480 0.2951 17.0540 11.9844 0.9230 5.0696 -4.4890 -53.7628 -0.0770 -0.0883
0.0384 0.48 540 0.1739 17.2042 12.1334 0.9490 5.0708 -2.9873 -52.2731 -0.0512 -0.0612
0.4008 0.5333 600 0.1706 17.2853 11.6981 0.9470 5.5872 -2.1760 -56.6266 -0.0358 -0.0469
0.1678 0.5867 660 0.2050 17.2021 11.5656 0.9450 5.6365 -3.0082 -57.9516 -0.0160 -0.0270
0.2272 0.64 720 0.1402 17.3928 11.7696 0.9520 5.6233 -1.1005 -55.9117 -0.0229 -0.0322
0.1915 0.6933 780 0.2441 17.3947 11.7656 0.9320 5.6290 -1.0823 -55.9507 -0.0166 -0.0266
0.0635 0.7467 840 0.1689 17.3812 11.5343 0.9450 5.8469 -1.2169 -58.2643 -0.0111 -0.0217
0.1703 0.8 900 0.1400 17.3271 11.3817 0.9610 5.9455 -1.7577 -59.7906 0.0002 -0.0105
0.1138 0.8533 960 0.1441 17.3149 11.3432 0.9630 5.9718 -1.8795 -60.1756 0.0015 -0.0094
0.0513 0.9067 1020 0.1412 17.3211 11.3263 0.9610 5.9948 -1.8178 -60.3445 0.0045 -0.0065
0.1189 0.96 1080 0.1508 17.2887 11.3001 0.9610 5.9886 -2.1420 -60.6061 0.0074 -0.0036

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.20.1
Downloads last month
3
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for Howard881010/heat_transfer_sft_dpo_fs