gpt2_model_card_distily_test
This student model is distilled from the teacher model gpt2 using the dataset (unspecified).
The Distily library was used for this distillation.
It achieves the following results on the evaluation set:
- eval_enwikippl: 18261.1387
- eval_frwikippl: 38633.1055
- eval_zhwikippl: 52085.4805
- eval_loss: 0.0005
- eval_runtime: 0.0656
- eval_samples_per_second: 15.248
- eval_steps_per_second: 15.248
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- distillation_strategy: logits_activations
- loss_fn: reverse_kl
- train_embeddings: True
- learning_rate: 0.0001
- train_batch_size: 1
- eval_batch_size: 2
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- num_epochs: 1.0
Resource Usage
Peak GPU Memory: 1.2411 GB
Model Results
eval_
metrics:
step | epoch | enwikippl | frwikippl | loss | runtime | samples_per_second | steps_per_second | zhwikippl |
---|---|---|---|---|---|---|---|---|
teacher eval | 30.2266 | 57.3005 | 18.1903 | |||||
0 | 0 | 58974.8945 | 59857.6992 | 0.0042 | 0.1173 | 8.525 | 8.525 | 60252.3672 |
30 | 0.3030 | 26646.1797 | 43684.125 | 0.0006 | 0.0661 | 15.123 | 15.123 | 53511.3242 |
60 | 0.6061 | 18083.6934 | 38626.9922 | 0.0005 | 0.0647 | 15.459 | 15.459 | 53146.3672 |
90 | 0.9091 | 18261.8535 | 38627.6914 | 0.0005 | 0.0656 | 15.248 | 15.248 | 52085.4805 |
99 | 1.0 | 18261.1387 | 38633.1055 | 0.0005 | 0.0656 | 15.248 | 15.248 | 52085.4805 |
Framework versions
- Distily 0.1.0
- Transformers 4.43.3
- Pytorch 2.3.0
- Datasets 2.20.0
- Downloads last month
- 8
Model tree for lapp0/gpt2_model_card_distily_test
Base model
openai-community/gpt2