wmt19-ende-t5-small

This model is a fine-tuned version of t5-small on the wmt19 dataset. It achieves the following results on the evaluation set:

  • Loss: 1.5150
  • Bleu: 16.0852
  • Brevity Penalty: 0.5512

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 256
  • eval_batch_size: 512
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 512
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant
  • training_steps: 10000

Training results

Training Loss Epoch Step Validation Loss Bleu Brevity Penalty
2.7369 0.01 100 2.0018 9.0851 0.5107
3.3896 0.02 200 1.9108 9.9970 0.5127
3.0442 0.03 300 1.8627 10.7670 0.5245
2.5136 0.04 400 1.8244 10.9280 0.5132
2.4092 0.05 500 1.7951 11.4717 0.5260
3.2441 0.06 600 1.7736 11.7350 0.5197
2.6997 0.07 700 1.7563 12.0741 0.5260
2.5072 0.08 800 1.7416 12.3735 0.5283
2.3788 0.09 900 1.7267 12.4288 0.5285
2.3533 0.1 1000 1.7247 12.4395 0.5249
2.2911 0.11 1100 1.7078 12.3887 0.5201
2.3949 0.12 1200 1.6997 12.8109 0.5288
2.2343 0.13 1300 1.6930 12.8213 0.5283
2.2525 0.14 1400 1.6851 13.1221 0.5285
2.2604 0.15 1500 1.6795 13.0896 0.5261
2.3146 0.16 1600 1.6723 13.1741 0.5291
2.5767 0.17 1700 1.6596 13.4224 0.5248
2.698 0.18 1800 1.6576 13.6733 0.5334
2.6416 0.19 1900 1.6514 13.7184 0.5350
3.0841 0.2 2000 1.6448 13.9079 0.5357
2.5039 0.21 2100 1.6375 13.9860 0.5361
2.5829 0.22 2200 1.6366 13.9246 0.5328
2.5332 0.23 2300 1.6348 13.4895 0.5209
2.5832 0.24 2400 1.6240 14.0445 0.5349
2.8577 0.25 2500 1.6182 14.1085 0.5344
2.9157 0.26 2600 1.6285 13.7982 0.5365
2.6758 0.27 2700 1.6249 13.8638 0.5392
2.0391 0.28 2800 1.6205 13.9645 0.5396
2.8146 0.29 2900 1.6210 14.2823 0.5409
2.6602 0.3 3000 1.6219 13.9663 0.5391
1.7745 0.31 3100 1.6088 14.4206 0.5413
2.3483 0.32 3200 1.6050 14.6208 0.5471
1.9911 0.33 3300 1.6004 14.5458 0.5396
1.8973 0.34 3400 1.5985 14.5387 0.5400
2.6956 0.35 3500 1.6005 14.7482 0.5458
2.322 0.36 3600 1.5949 14.7322 0.5448
1.5147 0.37 3700 1.5966 14.8456 0.5431
2.0606 0.38 3800 1.5899 14.6267 0.5333
3.0341 0.39 3900 1.5842 14.7705 0.5414
1.5069 0.4 4000 1.5911 14.6861 0.5372
2.339 0.41 4100 1.5949 14.6970 0.5481
2.5221 0.42 4200 1.5870 14.6996 0.5403
1.6398 0.43 4300 1.5790 14.8826 0.5431
2.2758 0.44 4400 1.5818 14.5580 0.5375
2.2622 0.45 4500 1.5821 15.0062 0.5428
1.3329 0.46 4600 1.5792 14.7609 0.5377
1.7537 0.47 4700 1.5744 15.1037 0.5425
2.5379 0.48 4800 1.5756 15.2684 0.5479
2.1236 0.49 4900 1.5822 14.8229 0.5478
2.9621 0.5 5000 1.5747 14.9948 0.5443
1.9832 0.51 5100 1.5838 14.8682 0.5468
1.4962 0.52 5200 1.5836 14.8094 0.5397
2.4318 0.53 5300 1.5826 14.8213 0.5422
1.9338 0.54 5400 1.5869 14.5571 0.5402
1.404 0.55 5500 1.5891 14.5103 0.5414
2.2803 0.56 5600 1.5864 14.6338 0.5417
2.3725 0.57 5700 1.5893 14.3405 0.5385
1.1436 0.58 5800 1.5703 15.3309 0.5457
2.1695 0.59 5900 1.5690 15.3571 0.5438
1.7295 0.6 6000 1.5653 15.3547 0.5421
1.3033 0.61 6100 1.5649 15.3084 0.5442
2.396 0.62 6200 1.5592 15.5594 0.5440
2.133 0.63 6300 1.5634 15.3689 0.5420
1.1775 0.64 6400 1.5639 15.4869 0.5389
2.0793 0.65 6500 1.5541 15.6320 0.5453
1.7569 0.66 6600 1.5588 15.7405 0.5429
1.1035 0.67 6700 1.5520 15.7011 0.5450
1.5799 0.68 6800 1.5517 15.9203 0.5490
1.7737 0.69 6900 1.5473 15.8992 0.5480
1.3071 0.7 7000 1.5491 15.7140 0.5446
2.2214 0.71 7100 1.5460 15.9360 0.5479
1.7848 0.72 7200 1.5431 15.9338 0.5490
1.1231 0.73 7300 1.5398 15.8774 0.5444
1.7741 0.74 7400 1.5399 15.9724 0.5451
1.7098 0.75 7500 1.5361 15.9098 0.5447
1.0787 0.76 7600 1.5393 15.9781 0.5457
1.9856 0.77 7700 1.5348 15.9521 0.5462
2.1294 0.78 7800 1.5345 16.0042 0.5463
1.1938 0.79 7900 1.5314 16.0554 0.5495
1.9579 0.8 8000 1.5307 15.9349 0.5482
1.844 0.81 8100 1.5285 15.8589 0.5448
1.1464 0.82 8200 1.5413 15.9210 0.5435
2.2903 0.83 8300 1.5230 16.0164 0.5405
2.1489 0.84 8400 1.5263 15.9423 0.5443
1.8138 0.85 8500 1.5350 15.8267 0.5464
2.4025 0.86 8600 1.5275 15.8493 0.5430
1.6758 0.87 8700 1.5206 15.9246 0.5464
1.3671 0.88 8800 1.5235 15.9662 0.5460
2.3341 0.89 8900 1.5221 16.0465 0.5456
1.8405 0.9 9000 1.5201 16.0834 0.5454
1.4133 0.91 9100 1.5250 15.8619 0.5442
2.4374 0.92 9200 1.5261 15.8174 0.5429
1.3627 0.93 9300 1.5257 15.7541 0.5450
1.5003 0.94 9400 1.5249 15.9109 0.5463
2.2002 0.95 9500 1.5252 15.8338 0.5434
2.3461 0.96 9600 1.5262 15.9195 0.5469
1.2607 0.97 9700 1.5197 15.8370 0.5459
2.3737 0.98 9800 1.5178 16.0579 0.5475
1.3968 0.99 9900 1.5132 16.1729 0.5522
1.1816 1.0 10000 1.5150 16.0852 0.5512

Framework versions

  • Transformers 4.30.2
  • Pytorch 2.0.1+cu118
  • Datasets 2.12.0
  • Tokenizers 0.13.3
Downloads last month
43
Safetensors
Model size
60.5M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train bri25yu/wmt19-ende-t5-small

Evaluation results