File size: 11,350 Bytes
ad32c34
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
license: other
base_model: microsoft/phi-1_5
tags:
- generated_from_trainer
model-index:
- name: phi_1.5_dpo_v3
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# phi_1.5_dpo_v3

This model is a fine-tuned version of [microsoft/phi-1_5](https://huggingface.co./microsoft/phi-1_5) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.0000
- Rewards/chosen: -0.6229
- Rewards/rejected: -20.0554
- Rewards/accuracies: 1.0
- Rewards/margins: 19.4326
- Logps/rejected: -599.3906
- Logps/chosen: -97.1858
- Logits/rejected: 4.0995
- Logits/chosen: 5.4831

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 2
- training_steps: 500

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.4948        | 0.27  | 10   | 0.1734          | -0.0170        | -1.7389          | 1.0                | 1.7219          | -416.2255      | -91.1275     | 4.9609          | 5.8425        |
| 0.0677        | 0.54  | 20   | 0.0006          | -0.1878        | -8.3012          | 1.0                | 8.1134          | -481.8484      | -92.8350     | 4.5966          | 5.7680        |
| 0.0002        | 0.81  | 30   | 0.0000          | -0.3920        | -14.2987         | 1.0                | 13.9067         | -541.8231      | -94.8774     | 4.3134          | 5.6330        |
| 0.0001        | 1.08  | 40   | 0.0000          | -0.5124        | -17.3966         | 1.0                | 16.8841         | -572.8019      | -96.0813     | 4.1996          | 5.5630        |
| 0.0           | 1.35  | 50   | 0.0000          | -0.5702        | -18.7616         | 1.0                | 18.1915         | -586.4525      | -96.6585     | 4.1536          | 5.5312        |
| 0.0           | 1.62  | 60   | 0.0000          | -0.5923        | -19.3013         | 1.0                | 18.7089         | -591.8488      | -96.8803     | 4.1366          | 5.5189        |
| 0.0           | 1.89  | 70   | 0.0000          | -0.6007        | -19.5016         | 1.0                | 18.9008         | -593.8517      | -96.9644     | 4.1304          | 5.5141        |
| 0.0           | 2.16  | 80   | 0.0000          | -0.6059        | -19.6000         | 1.0                | 18.9940         | -594.8359      | -97.0163     | 4.1267          | 5.5114        |
| 0.0           | 2.43  | 90   | 0.0000          | -0.6070        | -19.6481         | 1.0                | 19.0411         | -595.3173      | -97.0273     | 4.1248          | 5.5089        |
| 0.0           | 2.7   | 100  | 0.0000          | -0.6084        | -19.6635         | 1.0                | 19.0551         | -595.4709      | -97.0409     | 4.1232          | 5.5085        |
| 0.0           | 2.97  | 110  | 0.0000          | -0.6088        | -19.6749         | 1.0                | 19.0661         | -595.5847      | -97.0451     | 4.1232          | 5.5081        |
| 0.0           | 3.24  | 120  | 0.0000          | -0.6088        | -19.6989         | 1.0                | 19.0900         | -595.8249      | -97.0454     | 4.1225          | 5.5062        |
| 0.0           | 3.51  | 130  | 0.0000          | -0.6088        | -19.7185         | 1.0                | 19.1097         | -596.0208      | -97.0448     | 4.1200          | 5.5053        |
| 0.0           | 3.78  | 140  | 0.0000          | -0.6091        | -19.7351         | 1.0                | 19.1260         | -596.1869      | -97.0481     | 4.1203          | 5.5043        |
| 0.0           | 4.05  | 150  | 0.0000          | -0.6097        | -19.7339         | 1.0                | 19.1241         | -596.1747      | -97.0545     | 4.1200          | 5.5044        |
| 0.0           | 4.32  | 160  | 0.0000          | -0.6101        | -19.7392         | 1.0                | 19.1291         | -596.2282      | -97.0581     | 4.1191          | 5.5041        |
| 0.0           | 4.59  | 170  | 0.0000          | -0.6095        | -19.7407         | 1.0                | 19.1312         | -596.2433      | -97.0524     | 4.1196          | 5.5041        |
| 0.0           | 4.86  | 180  | 0.0000          | -0.6127        | -19.7737         | 1.0                | 19.1610         | -596.5731      | -97.0837     | 4.1176          | 5.5019        |
| 0.0           | 5.14  | 190  | 0.0000          | -0.6138        | -19.7921         | 1.0                | 19.1784         | -596.7576      | -97.0946     | 4.1164          | 5.5004        |
| 0.0           | 5.41  | 200  | 0.0000          | -0.6132        | -19.7929         | 1.0                | 19.1796         | -596.7647      | -97.0892     | 4.1152          | 5.5001        |
| 0.0           | 5.68  | 210  | 0.0000          | -0.6115        | -19.7954         | 1.0                | 19.1839         | -596.7902      | -97.0723     | 4.1154          | 5.4998        |
| 0.0           | 5.95  | 220  | 0.0000          | -0.6129        | -19.8083         | 1.0                | 19.1954         | -596.9189      | -97.0859     | 4.1143          | 5.4990        |
| 0.0           | 6.22  | 230  | 0.0000          | -0.6153        | -19.8312         | 1.0                | 19.2159         | -597.1479      | -97.1100     | 4.1132          | 5.4973        |
| 0.0           | 6.49  | 240  | 0.0000          | -0.6142        | -19.8468         | 1.0                | 19.2325         | -597.3038      | -97.0994     | 4.1127          | 5.4970        |
| 0.0           | 6.76  | 250  | 0.0000          | -0.6141        | -19.8735         | 1.0                | 19.2594         | -597.5714      | -97.0983     | 4.1111          | 5.4953        |
| 0.0           | 7.03  | 260  | 0.0000          | -0.6148        | -19.8878         | 1.0                | 19.2730         | -597.7144      | -97.1054     | 4.1100          | 5.4941        |
| 0.0           | 7.3   | 270  | 0.0000          | -0.6164        | -19.8937         | 1.0                | 19.2773         | -597.7730      | -97.1213     | 4.1091          | 5.4937        |
| 0.0           | 7.57  | 280  | 0.0000          | -0.6176        | -19.9184         | 1.0                | 19.3009         | -598.0203      | -97.1326     | 4.1079          | 5.4924        |
| 0.0           | 7.84  | 290  | 0.0000          | -0.6191        | -19.9314         | 1.0                | 19.3124         | -598.1504      | -97.1476     | 4.1073          | 5.4910        |
| 0.0           | 8.11  | 300  | 0.0000          | -0.6155        | -19.9405         | 1.0                | 19.3250         | -598.2412      | -97.1125     | 4.1067          | 5.4906        |
| 0.0           | 8.38  | 310  | 0.0000          | -0.6184        | -19.9647         | 1.0                | 19.3463         | -598.4835      | -97.1412     | 4.1057          | 5.4891        |
| 0.0           | 8.65  | 320  | 0.0000          | -0.6201        | -19.9751         | 1.0                | 19.3550         | -598.5868      | -97.1580     | 4.1047          | 5.4883        |
| 0.0           | 8.92  | 330  | 0.0000          | -0.6189        | -19.9759         | 1.0                | 19.3570         | -598.5950      | -97.1458     | 4.1044          | 5.4881        |
| 0.0           | 9.19  | 340  | 0.0000          | -0.6209        | -19.9780         | 1.0                | 19.3572         | -598.6162      | -97.1656     | 4.1039          | 5.4880        |
| 0.0           | 9.46  | 350  | 0.0000          | -0.6196        | -19.9837         | 1.0                | 19.3641         | -598.6727      | -97.1528     | 4.1043          | 5.4878        |
| 0.0           | 9.73  | 360  | 0.0000          | -0.6194        | -19.9866         | 1.0                | 19.3672         | -598.7023      | -97.1515     | 4.1041          | 5.4878        |
| 0.0           | 10.0  | 370  | 0.0000          | -0.6199        | -19.9960         | 1.0                | 19.3761         | -598.7965      | -97.1560     | 4.1030          | 5.4867        |
| 0.0           | 10.27 | 380  | 0.0000          | -0.6209        | -20.0033         | 1.0                | 19.3824         | -598.8690      | -97.1657     | 4.1025          | 5.4862        |
| 0.0           | 10.54 | 390  | 0.0000          | -0.6205        | -20.0132         | 1.0                | 19.3927         | -598.9681      | -97.1625     | 4.1021          | 5.4857        |
| 0.0           | 10.81 | 400  | 0.0000          | -0.6226        | -20.0238         | 1.0                | 19.4012         | -599.0746      | -97.1832     | 4.1013          | 5.4849        |
| 0.0           | 11.08 | 410  | 0.0000          | -0.6207        | -20.0343         | 1.0                | 19.4136         | -599.1791      | -97.1641     | 4.1014          | 5.4846        |
| 0.0           | 11.35 | 420  | 0.0000          | -0.6215        | -20.0337         | 1.0                | 19.4122         | -599.1733      | -97.1719     | 4.1010          | 5.4847        |
| 0.0           | 11.62 | 430  | 0.0000          | -0.6212        | -20.0356         | 1.0                | 19.4144         | -599.1924      | -97.1693     | 4.1008          | 5.4845        |
| 0.0           | 11.89 | 440  | 0.0000          | -0.6216        | -20.0326         | 1.0                | 19.4111         | -599.1625      | -97.1727     | 4.1007          | 5.4847        |
| 0.0           | 12.16 | 450  | 0.0000          | -0.6219        | -20.0401         | 1.0                | 19.4182         | -599.2375      | -97.1761     | 4.0998          | 5.4838        |
| 0.0           | 12.43 | 460  | 0.0000          | -0.6225        | -20.0430         | 1.0                | 19.4205         | -599.2663      | -97.1819     | 4.1004          | 5.4836        |
| 0.0           | 12.7  | 470  | 0.0000          | -0.6230        | -20.0486         | 1.0                | 19.4255         | -599.3220      | -97.1875     | 4.1003          | 5.4836        |
| 0.0           | 12.97 | 480  | 0.0000          | -0.6225        | -20.0484         | 1.0                | 19.4259         | -599.3201      | -97.1819     | 4.1002          | 5.4834        |
| 0.0           | 13.24 | 490  | 0.0000          | -0.6209        | -20.0524         | 1.0                | 19.4315         | -599.3601      | -97.1659     | 4.1000          | 5.4831        |
| 0.0           | 13.51 | 500  | 0.0000          | -0.6229        | -20.0554         | 1.0                | 19.4326         | -599.3906      | -97.1858     | 4.0995          | 5.4831        |


### Framework versions

- Transformers 4.33.0
- Pytorch 2.0.1+cu117
- Datasets 2.1.0
- Tokenizers 0.13.3