RewardModelSmallerQuestionWithTwoLabelsLengthJustified

This model is a fine-tuned version of roberta-large on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	F1	Roc Auc	Accuracy
0.7105	1.0	145	0.6814	0.5260	0.5192	0.5048
0.6899	2.0	290	0.6530	0.6090	0.6102	0.6038
0.6703	3.0	435	0.6318	0.6387	0.6565	0.6070
0.6432	4.0	580	0.6098	0.6961	0.7029	0.6805
0.6273	5.0	725	0.5909	0.7118	0.7141	0.7061
0.64	6.0	870	0.5837	0.7038	0.7029	0.6965
0.6178	7.0	1015	0.5829	0.7005	0.6981	0.6869
0.6342	8.0	1160	0.5855	0.6785	0.6805	0.6741
0.583	9.0	1305	0.5549	0.7310	0.7284	0.7188
0.5801	10.0	1450	0.5805	0.6710	0.6773	0.6581
0.6279	11.0	1595	0.6581	0.6003	0.6022	0.5974
0.6112	12.0	1740	0.5382	0.7372	0.7380	0.7348
0.5967	13.0	1885	0.6305	0.6443	0.6438	0.6422
0.5927	14.0	2030	0.6144	0.6613	0.6645	0.6550
0.5968	15.0	2175	0.5825	0.6901	0.6901	0.6901
0.6122	16.0	2320	0.5858	0.6815	0.6805	0.6773
0.5941	17.0	2465	0.5719	0.6979	0.7013	0.6901
0.5977	18.0	2610	0.6043	0.6699	0.6709	0.6677
0.59	19.0	2755	0.5465	0.7203	0.7220	0.7157
0.5871	20.0	2900	0.6474	0.6262	0.6262	0.6262
0.5932	21.0	3045	0.5701	0.6945	0.6965	0.6901
0.5966	22.0	3190	0.5281	0.7387	0.7412	0.7316
0.6006	23.0	3335	0.5713	0.6945	0.6965	0.6869
0.5696	24.0	3480	0.6498	0.6242	0.6230	0.6198
0.5921	25.0	3625	0.6453	0.6359	0.6342	0.6294
0.5761	26.0	3770	0.5226	0.7528	0.7524	0.7508
0.5504	27.0	3915	0.5793	0.6751	0.6725	0.6645
0.5891	28.0	4060	0.5248	0.7539	0.7508	0.7380
0.5757	29.0	4205	0.5983	0.6699	0.6693	0.6677
0.5631	30.0	4350	0.6187	0.6454	0.6454	0.6454