wmt19-ende-t5-small
This model is a fine-tuned version of t5-small on the wmt19 dataset. It achieves the following results on the evaluation set:
- Loss: 1.5150
- Bleu: 16.0852
- Brevity Penalty: 0.5512
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 256
- eval_batch_size: 512
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 512
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: constant
- training_steps: 10000
Training results
Training Loss | Epoch | Step | Validation Loss | Bleu | Brevity Penalty |
---|---|---|---|---|---|
2.7369 | 0.01 | 100 | 2.0018 | 9.0851 | 0.5107 |
3.3896 | 0.02 | 200 | 1.9108 | 9.9970 | 0.5127 |
3.0442 | 0.03 | 300 | 1.8627 | 10.7670 | 0.5245 |
2.5136 | 0.04 | 400 | 1.8244 | 10.9280 | 0.5132 |
2.4092 | 0.05 | 500 | 1.7951 | 11.4717 | 0.5260 |
3.2441 | 0.06 | 600 | 1.7736 | 11.7350 | 0.5197 |
2.6997 | 0.07 | 700 | 1.7563 | 12.0741 | 0.5260 |
2.5072 | 0.08 | 800 | 1.7416 | 12.3735 | 0.5283 |
2.3788 | 0.09 | 900 | 1.7267 | 12.4288 | 0.5285 |
2.3533 | 0.1 | 1000 | 1.7247 | 12.4395 | 0.5249 |
2.2911 | 0.11 | 1100 | 1.7078 | 12.3887 | 0.5201 |
2.3949 | 0.12 | 1200 | 1.6997 | 12.8109 | 0.5288 |
2.2343 | 0.13 | 1300 | 1.6930 | 12.8213 | 0.5283 |
2.2525 | 0.14 | 1400 | 1.6851 | 13.1221 | 0.5285 |
2.2604 | 0.15 | 1500 | 1.6795 | 13.0896 | 0.5261 |
2.3146 | 0.16 | 1600 | 1.6723 | 13.1741 | 0.5291 |
2.5767 | 0.17 | 1700 | 1.6596 | 13.4224 | 0.5248 |
2.698 | 0.18 | 1800 | 1.6576 | 13.6733 | 0.5334 |
2.6416 | 0.19 | 1900 | 1.6514 | 13.7184 | 0.5350 |
3.0841 | 0.2 | 2000 | 1.6448 | 13.9079 | 0.5357 |
2.5039 | 0.21 | 2100 | 1.6375 | 13.9860 | 0.5361 |
2.5829 | 0.22 | 2200 | 1.6366 | 13.9246 | 0.5328 |
2.5332 | 0.23 | 2300 | 1.6348 | 13.4895 | 0.5209 |
2.5832 | 0.24 | 2400 | 1.6240 | 14.0445 | 0.5349 |
2.8577 | 0.25 | 2500 | 1.6182 | 14.1085 | 0.5344 |
2.9157 | 0.26 | 2600 | 1.6285 | 13.7982 | 0.5365 |
2.6758 | 0.27 | 2700 | 1.6249 | 13.8638 | 0.5392 |
2.0391 | 0.28 | 2800 | 1.6205 | 13.9645 | 0.5396 |
2.8146 | 0.29 | 2900 | 1.6210 | 14.2823 | 0.5409 |
2.6602 | 0.3 | 3000 | 1.6219 | 13.9663 | 0.5391 |
1.7745 | 0.31 | 3100 | 1.6088 | 14.4206 | 0.5413 |
2.3483 | 0.32 | 3200 | 1.6050 | 14.6208 | 0.5471 |
1.9911 | 0.33 | 3300 | 1.6004 | 14.5458 | 0.5396 |
1.8973 | 0.34 | 3400 | 1.5985 | 14.5387 | 0.5400 |
2.6956 | 0.35 | 3500 | 1.6005 | 14.7482 | 0.5458 |
2.322 | 0.36 | 3600 | 1.5949 | 14.7322 | 0.5448 |
1.5147 | 0.37 | 3700 | 1.5966 | 14.8456 | 0.5431 |
2.0606 | 0.38 | 3800 | 1.5899 | 14.6267 | 0.5333 |
3.0341 | 0.39 | 3900 | 1.5842 | 14.7705 | 0.5414 |
1.5069 | 0.4 | 4000 | 1.5911 | 14.6861 | 0.5372 |
2.339 | 0.41 | 4100 | 1.5949 | 14.6970 | 0.5481 |
2.5221 | 0.42 | 4200 | 1.5870 | 14.6996 | 0.5403 |
1.6398 | 0.43 | 4300 | 1.5790 | 14.8826 | 0.5431 |
2.2758 | 0.44 | 4400 | 1.5818 | 14.5580 | 0.5375 |
2.2622 | 0.45 | 4500 | 1.5821 | 15.0062 | 0.5428 |
1.3329 | 0.46 | 4600 | 1.5792 | 14.7609 | 0.5377 |
1.7537 | 0.47 | 4700 | 1.5744 | 15.1037 | 0.5425 |
2.5379 | 0.48 | 4800 | 1.5756 | 15.2684 | 0.5479 |
2.1236 | 0.49 | 4900 | 1.5822 | 14.8229 | 0.5478 |
2.9621 | 0.5 | 5000 | 1.5747 | 14.9948 | 0.5443 |
1.9832 | 0.51 | 5100 | 1.5838 | 14.8682 | 0.5468 |
1.4962 | 0.52 | 5200 | 1.5836 | 14.8094 | 0.5397 |
2.4318 | 0.53 | 5300 | 1.5826 | 14.8213 | 0.5422 |
1.9338 | 0.54 | 5400 | 1.5869 | 14.5571 | 0.5402 |
1.404 | 0.55 | 5500 | 1.5891 | 14.5103 | 0.5414 |
2.2803 | 0.56 | 5600 | 1.5864 | 14.6338 | 0.5417 |
2.3725 | 0.57 | 5700 | 1.5893 | 14.3405 | 0.5385 |
1.1436 | 0.58 | 5800 | 1.5703 | 15.3309 | 0.5457 |
2.1695 | 0.59 | 5900 | 1.5690 | 15.3571 | 0.5438 |
1.7295 | 0.6 | 6000 | 1.5653 | 15.3547 | 0.5421 |
1.3033 | 0.61 | 6100 | 1.5649 | 15.3084 | 0.5442 |
2.396 | 0.62 | 6200 | 1.5592 | 15.5594 | 0.5440 |
2.133 | 0.63 | 6300 | 1.5634 | 15.3689 | 0.5420 |
1.1775 | 0.64 | 6400 | 1.5639 | 15.4869 | 0.5389 |
2.0793 | 0.65 | 6500 | 1.5541 | 15.6320 | 0.5453 |
1.7569 | 0.66 | 6600 | 1.5588 | 15.7405 | 0.5429 |
1.1035 | 0.67 | 6700 | 1.5520 | 15.7011 | 0.5450 |
1.5799 | 0.68 | 6800 | 1.5517 | 15.9203 | 0.5490 |
1.7737 | 0.69 | 6900 | 1.5473 | 15.8992 | 0.5480 |
1.3071 | 0.7 | 7000 | 1.5491 | 15.7140 | 0.5446 |
2.2214 | 0.71 | 7100 | 1.5460 | 15.9360 | 0.5479 |
1.7848 | 0.72 | 7200 | 1.5431 | 15.9338 | 0.5490 |
1.1231 | 0.73 | 7300 | 1.5398 | 15.8774 | 0.5444 |
1.7741 | 0.74 | 7400 | 1.5399 | 15.9724 | 0.5451 |
1.7098 | 0.75 | 7500 | 1.5361 | 15.9098 | 0.5447 |
1.0787 | 0.76 | 7600 | 1.5393 | 15.9781 | 0.5457 |
1.9856 | 0.77 | 7700 | 1.5348 | 15.9521 | 0.5462 |
2.1294 | 0.78 | 7800 | 1.5345 | 16.0042 | 0.5463 |
1.1938 | 0.79 | 7900 | 1.5314 | 16.0554 | 0.5495 |
1.9579 | 0.8 | 8000 | 1.5307 | 15.9349 | 0.5482 |
1.844 | 0.81 | 8100 | 1.5285 | 15.8589 | 0.5448 |
1.1464 | 0.82 | 8200 | 1.5413 | 15.9210 | 0.5435 |
2.2903 | 0.83 | 8300 | 1.5230 | 16.0164 | 0.5405 |
2.1489 | 0.84 | 8400 | 1.5263 | 15.9423 | 0.5443 |
1.8138 | 0.85 | 8500 | 1.5350 | 15.8267 | 0.5464 |
2.4025 | 0.86 | 8600 | 1.5275 | 15.8493 | 0.5430 |
1.6758 | 0.87 | 8700 | 1.5206 | 15.9246 | 0.5464 |
1.3671 | 0.88 | 8800 | 1.5235 | 15.9662 | 0.5460 |
2.3341 | 0.89 | 8900 | 1.5221 | 16.0465 | 0.5456 |
1.8405 | 0.9 | 9000 | 1.5201 | 16.0834 | 0.5454 |
1.4133 | 0.91 | 9100 | 1.5250 | 15.8619 | 0.5442 |
2.4374 | 0.92 | 9200 | 1.5261 | 15.8174 | 0.5429 |
1.3627 | 0.93 | 9300 | 1.5257 | 15.7541 | 0.5450 |
1.5003 | 0.94 | 9400 | 1.5249 | 15.9109 | 0.5463 |
2.2002 | 0.95 | 9500 | 1.5252 | 15.8338 | 0.5434 |
2.3461 | 0.96 | 9600 | 1.5262 | 15.9195 | 0.5469 |
1.2607 | 0.97 | 9700 | 1.5197 | 15.8370 | 0.5459 |
2.3737 | 0.98 | 9800 | 1.5178 | 16.0579 | 0.5475 |
1.3968 | 0.99 | 9900 | 1.5132 | 16.1729 | 0.5522 |
1.1816 | 1.0 | 10000 | 1.5150 | 16.0852 | 0.5512 |
Framework versions
- Transformers 4.30.2
- Pytorch 2.0.1+cu118
- Datasets 2.12.0
- Tokenizers 0.13.3
- Downloads last month
- 43
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.