phi_1.5_dpo_v3
This model is a fine-tuned version of microsoft/phi-1_5 on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.0000
- Rewards/chosen: -0.6229
- Rewards/rejected: -20.0554
- Rewards/accuracies: 1.0
- Rewards/margins: 19.4326
- Logps/rejected: -599.3906
- Logps/chosen: -97.1858
- Logits/rejected: 4.0995
- Logits/chosen: 5.4831
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 2
- training_steps: 500
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
---|---|---|---|---|---|---|---|---|---|---|---|
0.4948 | 0.27 | 10 | 0.1734 | -0.0170 | -1.7389 | 1.0 | 1.7219 | -416.2255 | -91.1275 | 4.9609 | 5.8425 |
0.0677 | 0.54 | 20 | 0.0006 | -0.1878 | -8.3012 | 1.0 | 8.1134 | -481.8484 | -92.8350 | 4.5966 | 5.7680 |
0.0002 | 0.81 | 30 | 0.0000 | -0.3920 | -14.2987 | 1.0 | 13.9067 | -541.8231 | -94.8774 | 4.3134 | 5.6330 |
0.0001 | 1.08 | 40 | 0.0000 | -0.5124 | -17.3966 | 1.0 | 16.8841 | -572.8019 | -96.0813 | 4.1996 | 5.5630 |
0.0 | 1.35 | 50 | 0.0000 | -0.5702 | -18.7616 | 1.0 | 18.1915 | -586.4525 | -96.6585 | 4.1536 | 5.5312 |
0.0 | 1.62 | 60 | 0.0000 | -0.5923 | -19.3013 | 1.0 | 18.7089 | -591.8488 | -96.8803 | 4.1366 | 5.5189 |
0.0 | 1.89 | 70 | 0.0000 | -0.6007 | -19.5016 | 1.0 | 18.9008 | -593.8517 | -96.9644 | 4.1304 | 5.5141 |
0.0 | 2.16 | 80 | 0.0000 | -0.6059 | -19.6000 | 1.0 | 18.9940 | -594.8359 | -97.0163 | 4.1267 | 5.5114 |
0.0 | 2.43 | 90 | 0.0000 | -0.6070 | -19.6481 | 1.0 | 19.0411 | -595.3173 | -97.0273 | 4.1248 | 5.5089 |
0.0 | 2.7 | 100 | 0.0000 | -0.6084 | -19.6635 | 1.0 | 19.0551 | -595.4709 | -97.0409 | 4.1232 | 5.5085 |
0.0 | 2.97 | 110 | 0.0000 | -0.6088 | -19.6749 | 1.0 | 19.0661 | -595.5847 | -97.0451 | 4.1232 | 5.5081 |
0.0 | 3.24 | 120 | 0.0000 | -0.6088 | -19.6989 | 1.0 | 19.0900 | -595.8249 | -97.0454 | 4.1225 | 5.5062 |
0.0 | 3.51 | 130 | 0.0000 | -0.6088 | -19.7185 | 1.0 | 19.1097 | -596.0208 | -97.0448 | 4.1200 | 5.5053 |
0.0 | 3.78 | 140 | 0.0000 | -0.6091 | -19.7351 | 1.0 | 19.1260 | -596.1869 | -97.0481 | 4.1203 | 5.5043 |
0.0 | 4.05 | 150 | 0.0000 | -0.6097 | -19.7339 | 1.0 | 19.1241 | -596.1747 | -97.0545 | 4.1200 | 5.5044 |
0.0 | 4.32 | 160 | 0.0000 | -0.6101 | -19.7392 | 1.0 | 19.1291 | -596.2282 | -97.0581 | 4.1191 | 5.5041 |
0.0 | 4.59 | 170 | 0.0000 | -0.6095 | -19.7407 | 1.0 | 19.1312 | -596.2433 | -97.0524 | 4.1196 | 5.5041 |
0.0 | 4.86 | 180 | 0.0000 | -0.6127 | -19.7737 | 1.0 | 19.1610 | -596.5731 | -97.0837 | 4.1176 | 5.5019 |
0.0 | 5.14 | 190 | 0.0000 | -0.6138 | -19.7921 | 1.0 | 19.1784 | -596.7576 | -97.0946 | 4.1164 | 5.5004 |
0.0 | 5.41 | 200 | 0.0000 | -0.6132 | -19.7929 | 1.0 | 19.1796 | -596.7647 | -97.0892 | 4.1152 | 5.5001 |
0.0 | 5.68 | 210 | 0.0000 | -0.6115 | -19.7954 | 1.0 | 19.1839 | -596.7902 | -97.0723 | 4.1154 | 5.4998 |
0.0 | 5.95 | 220 | 0.0000 | -0.6129 | -19.8083 | 1.0 | 19.1954 | -596.9189 | -97.0859 | 4.1143 | 5.4990 |
0.0 | 6.22 | 230 | 0.0000 | -0.6153 | -19.8312 | 1.0 | 19.2159 | -597.1479 | -97.1100 | 4.1132 | 5.4973 |
0.0 | 6.49 | 240 | 0.0000 | -0.6142 | -19.8468 | 1.0 | 19.2325 | -597.3038 | -97.0994 | 4.1127 | 5.4970 |
0.0 | 6.76 | 250 | 0.0000 | -0.6141 | -19.8735 | 1.0 | 19.2594 | -597.5714 | -97.0983 | 4.1111 | 5.4953 |
0.0 | 7.03 | 260 | 0.0000 | -0.6148 | -19.8878 | 1.0 | 19.2730 | -597.7144 | -97.1054 | 4.1100 | 5.4941 |
0.0 | 7.3 | 270 | 0.0000 | -0.6164 | -19.8937 | 1.0 | 19.2773 | -597.7730 | -97.1213 | 4.1091 | 5.4937 |
0.0 | 7.57 | 280 | 0.0000 | -0.6176 | -19.9184 | 1.0 | 19.3009 | -598.0203 | -97.1326 | 4.1079 | 5.4924 |
0.0 | 7.84 | 290 | 0.0000 | -0.6191 | -19.9314 | 1.0 | 19.3124 | -598.1504 | -97.1476 | 4.1073 | 5.4910 |
0.0 | 8.11 | 300 | 0.0000 | -0.6155 | -19.9405 | 1.0 | 19.3250 | -598.2412 | -97.1125 | 4.1067 | 5.4906 |
0.0 | 8.38 | 310 | 0.0000 | -0.6184 | -19.9647 | 1.0 | 19.3463 | -598.4835 | -97.1412 | 4.1057 | 5.4891 |
0.0 | 8.65 | 320 | 0.0000 | -0.6201 | -19.9751 | 1.0 | 19.3550 | -598.5868 | -97.1580 | 4.1047 | 5.4883 |
0.0 | 8.92 | 330 | 0.0000 | -0.6189 | -19.9759 | 1.0 | 19.3570 | -598.5950 | -97.1458 | 4.1044 | 5.4881 |
0.0 | 9.19 | 340 | 0.0000 | -0.6209 | -19.9780 | 1.0 | 19.3572 | -598.6162 | -97.1656 | 4.1039 | 5.4880 |
0.0 | 9.46 | 350 | 0.0000 | -0.6196 | -19.9837 | 1.0 | 19.3641 | -598.6727 | -97.1528 | 4.1043 | 5.4878 |
0.0 | 9.73 | 360 | 0.0000 | -0.6194 | -19.9866 | 1.0 | 19.3672 | -598.7023 | -97.1515 | 4.1041 | 5.4878 |
0.0 | 10.0 | 370 | 0.0000 | -0.6199 | -19.9960 | 1.0 | 19.3761 | -598.7965 | -97.1560 | 4.1030 | 5.4867 |
0.0 | 10.27 | 380 | 0.0000 | -0.6209 | -20.0033 | 1.0 | 19.3824 | -598.8690 | -97.1657 | 4.1025 | 5.4862 |
0.0 | 10.54 | 390 | 0.0000 | -0.6205 | -20.0132 | 1.0 | 19.3927 | -598.9681 | -97.1625 | 4.1021 | 5.4857 |
0.0 | 10.81 | 400 | 0.0000 | -0.6226 | -20.0238 | 1.0 | 19.4012 | -599.0746 | -97.1832 | 4.1013 | 5.4849 |
0.0 | 11.08 | 410 | 0.0000 | -0.6207 | -20.0343 | 1.0 | 19.4136 | -599.1791 | -97.1641 | 4.1014 | 5.4846 |
0.0 | 11.35 | 420 | 0.0000 | -0.6215 | -20.0337 | 1.0 | 19.4122 | -599.1733 | -97.1719 | 4.1010 | 5.4847 |
0.0 | 11.62 | 430 | 0.0000 | -0.6212 | -20.0356 | 1.0 | 19.4144 | -599.1924 | -97.1693 | 4.1008 | 5.4845 |
0.0 | 11.89 | 440 | 0.0000 | -0.6216 | -20.0326 | 1.0 | 19.4111 | -599.1625 | -97.1727 | 4.1007 | 5.4847 |
0.0 | 12.16 | 450 | 0.0000 | -0.6219 | -20.0401 | 1.0 | 19.4182 | -599.2375 | -97.1761 | 4.0998 | 5.4838 |
0.0 | 12.43 | 460 | 0.0000 | -0.6225 | -20.0430 | 1.0 | 19.4205 | -599.2663 | -97.1819 | 4.1004 | 5.4836 |
0.0 | 12.7 | 470 | 0.0000 | -0.6230 | -20.0486 | 1.0 | 19.4255 | -599.3220 | -97.1875 | 4.1003 | 5.4836 |
0.0 | 12.97 | 480 | 0.0000 | -0.6225 | -20.0484 | 1.0 | 19.4259 | -599.3201 | -97.1819 | 4.1002 | 5.4834 |
0.0 | 13.24 | 490 | 0.0000 | -0.6209 | -20.0524 | 1.0 | 19.4315 | -599.3601 | -97.1659 | 4.1000 | 5.4831 |
0.0 | 13.51 | 500 | 0.0000 | -0.6229 | -20.0554 | 1.0 | 19.4326 | -599.3906 | -97.1858 | 4.0995 | 5.4831 |
Framework versions
- Transformers 4.33.0
- Pytorch 2.0.1+cu117
- Datasets 2.1.0
- Tokenizers 0.13.3
Model tree for jmukesh99/phi_1.5_dpo_v3
Base model
microsoft/phi-1_5