wmt19-ende-t5-small

This model is a fine-tuned version of t5-small on the wmt19 dataset. It achieves the following results on the evaluation set:

Loss: 1.5150
Bleu: 16.0852
Brevity Penalty: 0.5512

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 256
eval_batch_size: 512
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 512
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant
training_steps: 10000

Training results

Training Loss	Epoch	Step	Validation Loss	Bleu	Brevity Penalty
2.7369	0.01	100	2.0018	9.0851	0.5107
3.3896	0.02	200	1.9108	9.9970	0.5127
3.0442	0.03	300	1.8627	10.7670	0.5245
2.5136	0.04	400	1.8244	10.9280	0.5132
2.4092	0.05	500	1.7951	11.4717	0.5260
3.2441	0.06	600	1.7736	11.7350	0.5197
2.6997	0.07	700	1.7563	12.0741	0.5260
2.5072	0.08	800	1.7416	12.3735	0.5283
2.3788	0.09	900	1.7267	12.4288	0.5285
2.3533	0.1	1000	1.7247	12.4395	0.5249
2.2911	0.11	1100	1.7078	12.3887	0.5201
2.3949	0.12	1200	1.6997	12.8109	0.5288
2.2343	0.13	1300	1.6930	12.8213	0.5283
2.2525	0.14	1400	1.6851	13.1221	0.5285
2.2604	0.15	1500	1.6795	13.0896	0.5261
2.3146	0.16	1600	1.6723	13.1741	0.5291
2.5767	0.17	1700	1.6596	13.4224	0.5248
2.698	0.18	1800	1.6576	13.6733	0.5334
2.6416	0.19	1900	1.6514	13.7184	0.5350
3.0841	0.2	2000	1.6448	13.9079	0.5357
2.5039	0.21	2100	1.6375	13.9860	0.5361
2.5829	0.22	2200	1.6366	13.9246	0.5328
2.5332	0.23	2300	1.6348	13.4895	0.5209
2.5832	0.24	2400	1.6240	14.0445	0.5349
2.8577	0.25	2500	1.6182	14.1085	0.5344
2.9157	0.26	2600	1.6285	13.7982	0.5365
2.6758	0.27	2700	1.6249	13.8638	0.5392
2.0391	0.28	2800	1.6205	13.9645	0.5396
2.8146	0.29	2900	1.6210	14.2823	0.5409
2.6602	0.3	3000	1.6219	13.9663	0.5391
1.7745	0.31	3100	1.6088	14.4206	0.5413
2.3483	0.32	3200	1.6050	14.6208	0.5471
1.9911	0.33	3300	1.6004	14.5458	0.5396
1.8973	0.34	3400	1.5985	14.5387	0.5400
2.6956	0.35	3500	1.6005	14.7482	0.5458
2.322	0.36	3600	1.5949	14.7322	0.5448
1.5147	0.37	3700	1.5966	14.8456	0.5431
2.0606	0.38	3800	1.5899	14.6267	0.5333
3.0341	0.39	3900	1.5842	14.7705	0.5414
1.5069	0.4	4000	1.5911	14.6861	0.5372
2.339	0.41	4100	1.5949	14.6970	0.5481
2.5221	0.42	4200	1.5870	14.6996	0.5403
1.6398	0.43	4300	1.5790	14.8826	0.5431
2.2758	0.44	4400	1.5818	14.5580	0.5375
2.2622	0.45	4500	1.5821	15.0062	0.5428
1.3329	0.46	4600	1.5792	14.7609	0.5377
1.7537	0.47	4700	1.5744	15.1037	0.5425
2.5379	0.48	4800	1.5756	15.2684	0.5479
2.1236	0.49	4900	1.5822	14.8229	0.5478
2.9621	0.5	5000	1.5747	14.9948	0.5443
1.9832	0.51	5100	1.5838	14.8682	0.5468
1.4962	0.52	5200	1.5836	14.8094	0.5397
2.4318	0.53	5300	1.5826	14.8213	0.5422
1.9338	0.54	5400	1.5869	14.5571	0.5402
1.404	0.55	5500	1.5891	14.5103	0.5414
2.2803	0.56	5600	1.5864	14.6338	0.5417
2.3725	0.57	5700	1.5893	14.3405	0.5385
1.1436	0.58	5800	1.5703	15.3309	0.5457
2.1695	0.59	5900	1.5690	15.3571	0.5438
1.7295	0.6	6000	1.5653	15.3547	0.5421
1.3033	0.61	6100	1.5649	15.3084	0.5442
2.396	0.62	6200	1.5592	15.5594	0.5440
2.133	0.63	6300	1.5634	15.3689	0.5420
1.1775	0.64	6400	1.5639	15.4869	0.5389
2.0793	0.65	6500	1.5541	15.6320	0.5453
1.7569	0.66	6600	1.5588	15.7405	0.5429
1.1035	0.67	6700	1.5520	15.7011	0.5450
1.5799	0.68	6800	1.5517	15.9203	0.5490
1.7737	0.69	6900	1.5473	15.8992	0.5480
1.3071	0.7	7000	1.5491	15.7140	0.5446
2.2214	0.71	7100	1.5460	15.9360	0.5479
1.7848	0.72	7200	1.5431	15.9338	0.5490
1.1231	0.73	7300	1.5398	15.8774	0.5444
1.7741	0.74	7400	1.5399	15.9724	0.5451
1.7098	0.75	7500	1.5361	15.9098	0.5447
1.0787	0.76	7600	1.5393	15.9781	0.5457
1.9856	0.77	7700	1.5348	15.9521	0.5462
2.1294	0.78	7800	1.5345	16.0042	0.5463
1.1938	0.79	7900	1.5314	16.0554	0.5495
1.9579	0.8	8000	1.5307	15.9349	0.5482
1.844	0.81	8100	1.5285	15.8589	0.5448
1.1464	0.82	8200	1.5413	15.9210	0.5435
2.2903	0.83	8300	1.5230	16.0164	0.5405
2.1489	0.84	8400	1.5263	15.9423	0.5443
1.8138	0.85	8500	1.5350	15.8267	0.5464
2.4025	0.86	8600	1.5275	15.8493	0.5430
1.6758	0.87	8700	1.5206	15.9246	0.5464
1.3671	0.88	8800	1.5235	15.9662	0.5460
2.3341	0.89	8900	1.5221	16.0465	0.5456
1.8405	0.9	9000	1.5201	16.0834	0.5454
1.4133	0.91	9100	1.5250	15.8619	0.5442
2.4374	0.92	9200	1.5261	15.8174	0.5429
1.3627	0.93	9300	1.5257	15.7541	0.5450
1.5003	0.94	9400	1.5249	15.9109	0.5463
2.2002	0.95	9500	1.5252	15.8338	0.5434
2.3461	0.96	9600	1.5262	15.9195	0.5469
1.2607	0.97	9700	1.5197	15.8370	0.5459
2.3737	0.98	9800	1.5178	16.0579	0.5475
1.3968	0.99	9900	1.5132	16.1729	0.5522
1.1816	1.0	10000	1.5150	16.0852	0.5512

Framework versions

Transformers 4.30.2
Pytorch 2.0.1+cu118
Datasets 2.12.0
Tokenizers 0.13.3

bri25yu
/

wmt19-ende-t5-small

wmt19-ende-t5-small

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train bri25yu/wmt19-ende-t5-small

Evaluation results