Edit model card

distily_bench_obj_cross_v2.4

This student model is distilled from the teacher model roneneldan/TinyStories-33M using the dataset (unspecified).

The Distily library was used for this distillation.

It achieves the following results on the evaluation set:

  • eval_enwikippl: 49928.3438
  • eval_frwikippl: 60082.1211
  • eval_zhwikippl: 75499.0547
  • eval_tinystoriesppl: 44922.0352
  • eval_loss: 6.1235
  • eval_runtime: 13.0526
  • eval_samples_per_second: 76.613
  • eval_steps_per_second: 9.577

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • distillation_objective: DistillationObjective(logits_loss_component=LossComponent(label=logits, weight=1, loss_fn=kl, layer_mapper=None, projector=None), hs_loss_component=LossComponent(label=hs, weight=0, loss_fn=None, layer_mapper=None, projector=None), attn_loss_component=LossComponent(label=attn, weight=0, loss_fn=None, layer_mapper=None, projector=None))
  • train_embeddings: True
  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1.0

Resource Usage

Peak GPU Memory: 8.0568 GB

Eval-Phase Metrics

step epoch enwikippl frwikippl loss runtime samples_per_second steps_per_second tinystoriesppl zhwikippl
teacher eval 169.9865 47377.9414 3.9789 4998.1294
0 0 77152.8516 72247.2109 6.5107 12.9698 77.102 9.638 77917.4219 77892.9766
500 0.0404 49943.8203 60082.1211 6.1230 12.9612 77.153 9.644 44951.7734 75499.0547
1000 0.0808 49943.8203 60082.1211 6.1230 12.975 77.071 9.634 44951.7734 75499.0547
1500 0.1212 49943.8203 60082.1211 6.1230 12.9603 77.159 9.645 44944.3164 75499.0547
2000 0.1616 49943.8203 60082.1211 6.1235 12.9533 77.2 9.65 44922.0352 75499.0547
2500 0.2020 49943.8203 60082.1211 6.1233 12.9601 77.16 9.645 44922.0352 75499.0547
3000 0.2424 49920.5820 60082.1211 6.1233 12.9557 77.186 9.648 44907.2148 75499.0547
3500 0.2828 49920.5820 60082.1211 6.1233 12.9662 77.124 9.64 44907.2148 75499.0547
4000 0.3232 49920.5820 60082.1211 6.1233 12.9824 77.027 9.628 44907.2148 75499.0547
4500 0.3636 49920.5820 60082.1211 6.1233 12.9965 76.944 9.618 44907.2148 75499.0547
5000 0.4040 49928.3438 60082.1211 6.1235 13.1112 76.271 9.534 44922.0352 75499.0547
5500 0.4444 49928.3438 60082.1211 6.1235 13.1865 75.835 9.479 44922.0352 75499.0547
6000 0.4848 49928.3438 60082.1211 6.1235 13.0376 76.701 9.588 44922.0352 75499.0547
6500 0.5253 49928.3438 60082.1211 6.1235 12.9934 76.962 9.62 44922.0352 75499.0547
7000 0.5657 49928.3438 60082.1211 6.1235 12.9741 77.077 9.635 44922.0352 75499.0547
7500 0.6061 49928.3438 60082.1211 6.1235 13.0011 76.917 9.615 44922.0352 75499.0547
8000 0.6465 49928.3438 60082.1211 6.1235 13.021 76.799 9.6 44922.0352 75499.0547
8500 0.6869 49928.3438 60082.1211 6.1235 13.023 76.787 9.598 44922.0352 75499.0547
9000 0.7273 49928.3438 60082.1211 6.1235 12.9717 77.091 9.636 44922.0352 75499.0547
9500 0.7677 49928.3438 60082.1211 6.1235 13.0526 76.613 9.577 44922.0352 75499.0547
10000 0.8081 49928.3438 60082.1211 6.1235 12.9964 76.944 9.618 44922.0352 75499.0547
10500 0.8485 49928.3438 60082.1211 6.1235 12.9662 77.123 9.64 44922.0352 75499.0547
11000 0.8889 49928.3438 60082.1211 6.1235 12.9957 76.949 9.619 44922.0352 75499.0547
11500 0.9293 49928.3438 60082.1211 6.1235 12.9518 77.209 9.651 44922.0352 75499.0547
12000 0.9697 49928.3438 60082.1211 6.1235 12.9583 77.171 9.646 44922.0352 75499.0547
12375 1.0 49928.3438 60082.1211 6.1235 13.0092 76.869 9.609 44922.0352 75499.0547

Framework versions

  • Distily 0.2.0
  • Transformers 4.44.0
  • Pytorch 2.3.0
  • Datasets 2.21.0
Downloads last month
5
Safetensors
Model size
68.5M params
Tensor type
BF16
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for lapp0/distily_bench_obj_cross_v2.4

Quantized
(10)
this model