Edit model card

distily_bench_gpt2_attn_part_2

This student model is distilled from the teacher model gpt2 using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 234.3043
  • eval_frwikippl: 1329.5667
  • eval_zhwikippl: 575.8531
  • eval_loss: 2.4344
  • eval_runtime: 87.576
  • eval_samples_per_second: 57.093
  • eval_steps_per_second: 7.137

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=2.0, loss_fn=cos, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 4e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.2206 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second zhwikippl
teacher eval 30.2086 57.2728 18.1784
0 0 56314.7695 59887.2773 7.8201 86.1883 58.013 7.252 59033.8086
1000 0.0162 792.2872 4804.1196 3.2178 86.0729 58.09 7.261 14971.4619
2000 0.0323 563.5076 3594.0178 3.0017 86.8465 57.573 7.197 2436.5176
3000 0.0485 462.8162 3038.3840 2.8886 86.4026 57.869 7.234 990.3070
4000 0.0646 394.8591 2549.8823 2.7877 86.3461 57.907 7.238 1110.5270
5000 0.0808 353.7159 2113.5315 2.6983 86.5685 57.758 7.22 845.9332
6000 0.0970 305.3852 1907.5966 2.6161 86.5704 57.756 7.22 792.3533
7000 0.1131 275.6120 1710.8368 2.5482 89.5827 55.814 6.977 704.9745
8000 0.1293 248.5870 1491.7041 2.4852 87.8996 56.883 7.11 655.4906
9000 0.1455 234.3043 1329.5667 2.4344 87.576 57.093 7.137 575.8531
10000 0.1616 210.1025 1200.8650 2.3778 88.2437 56.661 7.083 680.8273
11000 0.1778 195.9659 1137.0879 2.3352 87.2617 57.299 7.162 574.3172
12000 0.1939 177.5484 986.7017 2.2840 88.3708 56.58 7.072 511.2562
13000 0.2101 168.3902 992.2828 2.2496 86.6049 57.733 7.217 493.3487
14000 0.2263 159.1225 889.1183 2.2152 86.6651 57.693 7.212 434.3372
15000 0.2424 153.0509 800.1130 2.1876 86.8087 57.598 7.2 389.5484
16000 0.2586 146.4697 801.1292 2.1678 86.6505 57.703 7.213 490.2618
17000 0.2747 143.1525 782.1525 2.1519 86.9013 57.537 7.192 536.9359
18000 0.2909 139.4116 832.3362 2.1366 86.6158 57.726 7.216 568.2902
19000 0.3071 134.7601 733.2869 2.1223 86.3964 57.873 7.234 516.8853
20000 0.3232 132.8793 726.0842 2.1108 86.4694 57.824 7.228 376.9092
21000 0.3394 130.5677 658.1619 2.0982 86.9266 57.52 7.19 386.2850
22000 0.3556 130.3043 657.9764 2.0894 87.5122 57.135 7.142 418.9560
23000 0.3717 128.9555 687.2317 2.0831 87.0863 57.414 7.177 419.0120
24000 0.3879 125.7514 657.8835 2.0732 86.5642 57.761 7.22 391.6348
25000 0.4040 124.6431 676.3678 2.0703 86.6818 57.682 7.21 384.3811
26000 0.4202 124.4787 653.3537 2.0597 87.1964 57.342 7.168 403.9578
27000 0.4364 122.7223 659.5090 2.0546 86.5911 57.743 7.218 316.7159
28000 0.4525 123.1997 631.8794 2.0501 86.9662 57.494 7.187 295.1540
29000 0.4687 122.9990 659.6486 2.0423 87.0187 57.459 7.182 310.8498
30000 0.4848 123.1041 617.8698 2.0464 87.3158 57.263 7.158 315.3653
31000 0.5010 120.1201 613.0962 2.0351 87.311 57.267 7.158 308.4515
32000 0.5172 119.4781 630.6332 2.0323 87.0327 57.45 7.181 274.0701
33000 0.5333 118.0396 624.0867 2.0277 86.8174 57.592 7.199 285.6919
34000 0.5495 119.1354 588.2814 2.0249 86.7837 57.614 7.202 316.9274
35000 0.5657 117.2539 567.9440 2.0212 87.7979 56.949 7.119 303.2645
36000 0.5818 117.5548 608.1882 2.0241 86.4044 57.867 7.233 256.7451
37000 0.5980 116.6455 600.1797 2.0164 87.1429 57.377 7.172 345.4349
38000 0.6141 116.6998 564.5506 2.0140 87.5854 57.087 7.136 283.1472
39000 0.6303 113.8975 538.0084 2.0085 86.8118 57.596 7.199 285.9208
40000 0.6465 115.6533 579.6762 2.0130 87.6272 57.06 7.132 268.8502
41000 0.6626 114.0037 569.7087 2.0107 86.8951 57.541 7.193 298.6428
42000 0.6788 114.4206 558.2177 2.0114 86.4056 57.867 7.233 325.7667
43000 0.6949 114.1277 570.9554 2.0067 86.7633 57.628 7.204 297.6475
44000 0.7111 112.6310 603.2343 2.0041 87.2036 57.337 7.167 265.9578
45000 0.7273 112.3951 582.9551 1.9978 86.1934 58.009 7.251 276.3855
46000 0.7434 112.9463 591.3171 1.9976 86.5934 57.741 7.218 270.1097
47000 0.7596 112.1510 564.6300 1.9943 86.827 57.586 7.198 323.6853
48000 0.7758 112.8236 513.6188 1.9968 86.3974 57.872 7.234 305.2553
49000 0.7919 112.7886 565.1480 1.9948 86.5571 57.765 7.221 276.0167
50000 0.8081 111.8900 592.2350 1.9932 86.9533 57.502 7.188 247.9840
51000 0.8242 111.4391 588.3229 1.9920 86.4566 57.832 7.229 298.2842
52000 0.8404 109.9350 549.1997 1.9904 86.5867 57.746 7.218 318.3695
53000 0.8566 110.8264 544.7263 1.9856 87.1758 57.355 7.169 311.5147
54000 0.8727 111.0849 544.9952 1.9857 87.388 57.216 7.152 334.9867
55000 0.8889 111.1799 602.7242 1.9865 86.8376 57.579 7.197 265.1776
56000 0.9051 111.2490 553.5536 1.9821 87.5367 57.119 7.14 326.5943
57000 0.9212 110.1914 582.8317 1.9870 87.0656 57.428 7.178 1162.1104
58000 0.9374 109.1016 657.6982 1.9860 86.6253 57.72 7.215 322.6926
59000 0.9535 111.8119 596.5937 1.9831 86.3097 57.931 7.241 408.6782
60000 0.9697 108.9746 586.0871 1.9748 86.6963 57.673 7.209 268.2405
61000 0.9859 109.9862 560.1890 1.9777 86.8734 57.555 7.194 294.4846
61875 1.0 110.1401 573.6587 1.9760 86.766 57.626 7.203 273.4851

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.20.0
Downloads last month
6
Safetensors
Model size
124M params
Tensor type
BF16
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for lapp0/distily_bench_gpt2_attn_part_2

Quantized
(50)
this model