Edit model card

gpt2-wikitext2-LONG

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 8.5448

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • training_steps: 250000

Training results

Training Loss Epoch Step Validation Loss
6.2474 1.0 2240 6.1890
5.8841 2.0 4480 5.8967
5.5679 3.0 6720 5.6904
5.3801 4.0 8960 5.5438
5.1759 5.0 11200 5.4250
5.0068 6.0 13440 5.3377
4.818 7.0 15680 5.2648
4.6836 8.0 17920 5.2054
4.5107 9.0 20160 5.1622
4.3756 10.0 22400 5.1392
4.2265 11.0 24640 5.1194
4.0932 12.0 26880 5.1065
3.9449 13.0 29120 5.1109
3.8233 14.0 31360 5.1250
3.6796 15.0 33600 5.1433
3.5556 16.0 35840 5.1692
3.4383 17.0 38080 5.2086
3.299 18.0 40320 5.2423
3.1903 19.0 42560 5.2913
3.0618 20.0 44800 5.3327
2.9429 21.0 47040 5.3867
2.8275 22.0 49280 5.4452
2.7206 23.0 51520 5.5040
2.6081 24.0 53760 5.5760
2.5133 25.0 56000 5.6352
2.3831 26.0 58240 5.6871
2.2795 27.0 60480 5.7515
2.2009 28.0 62720 5.8257
2.0864 29.0 64960 5.8798
2.0069 30.0 67200 5.9585
1.9058 31.0 69440 6.0158
1.8336 32.0 71680 6.0893
1.7406 33.0 73920 6.1480
1.6725 34.0 76160 6.2075
1.5814 35.0 78400 6.2683
1.5209 36.0 80640 6.3362
1.4352 37.0 82880 6.4068
1.3732 38.0 85120 6.4493
1.3004 39.0 87360 6.5188
1.2466 40.0 89600 6.5716
1.1749 41.0 91840 6.6248
1.1317 42.0 94080 6.6937
1.0588 43.0 96320 6.7596
1.0154 44.0 98560 6.8063
0.9544 45.0 100800 6.8594
0.918 46.0 103040 6.9139
0.8603 47.0 105280 6.9788
0.8228 48.0 107520 7.0178
0.7757 49.0 109760 7.0820
0.7445 50.0 112000 7.1300
0.6947 51.0 114240 7.1802
0.6559 52.0 116480 7.2233
0.6281 53.0 118720 7.2744
0.5912 54.0 120960 7.3109
0.5713 55.0 123200 7.3557
0.537 56.0 125440 7.3980
0.5166 57.0 127680 7.4294
0.4882 58.0 129920 7.4812
0.4662 59.0 132160 7.5245
0.4427 60.0 134400 7.5481
0.4272 61.0 136640 7.5961
0.4046 62.0 138880 7.6457
0.395 63.0 141120 7.6701
0.3717 64.0 143360 7.7151
0.359 65.0 145600 7.7493
0.3435 66.0 147840 7.7703
0.333 67.0 150080 7.8155
0.3163 68.0 152320 7.8550
0.3074 69.0 154560 7.8780
0.2945 70.0 156800 7.9197
0.2866 71.0 159040 7.9441
0.2733 72.0 161280 7.9762
0.2655 73.0 163520 7.9940
0.2559 74.0 165760 8.0210
0.2489 75.0 168000 8.0440
0.2399 76.0 170240 8.0695
0.229 77.0 172480 8.0998
0.2254 78.0 174720 8.1213
0.2159 79.0 176960 8.1404
0.2118 80.0 179200 8.1594
0.2042 81.0 181440 8.1839
0.199 82.0 183680 8.2196
0.1935 83.0 185920 8.2277
0.1882 84.0 188160 8.2494
0.1826 85.0 190400 8.2727
0.1793 86.0 192640 8.2852
0.1732 87.0 194880 8.3022
0.1703 88.0 197120 8.3139
0.1647 89.0 199360 8.3354
0.1625 90.0 201600 8.3469
0.1579 91.0 203840 8.3671
0.154 92.0 206080 8.3825
0.1506 93.0 208320 8.3879
0.147 94.0 210560 8.4059
0.143 95.0 212800 8.4183
0.1403 96.0 215040 8.4287
0.1371 97.0 217280 8.4522
0.1351 98.0 219520 8.4547
0.1306 99.0 221760 8.4614
0.1294 100.0 224000 8.4809
0.126 101.0 226240 8.4951
0.1235 102.0 228480 8.4978
0.1213 103.0 230720 8.5041
0.1195 104.0 232960 8.5161
0.1174 105.0 235200 8.5176
0.1147 106.0 237440 8.5268
0.1134 107.0 239680 8.5325
0.1123 108.0 241920 8.5376
0.1101 109.0 244160 8.5404
0.1082 110.0 246400 8.5439
0.1083 111.0 248640 8.5434
0.1069 111.6071 250000 8.5448

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.4.0+cu121
  • Datasets 3.0.0
  • Tokenizers 0.19.1
Downloads last month
16
Safetensors
Model size
124M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for testingtime/gpt2-wikitext2-LONG

Finetuned
(1143)
this model