model_hh_usp2_200

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
8.0	100	1.7627	-5.8776	-6.7947	0.5400	0.9171	-122.7167	-118.4203	-0.0739	-0.0163
16.0	200	1.7526	-5.9719	-6.9070	0.5200	0.9351	-122.8416	-118.5252	-0.0772	-0.0191
24.0	300	1.7452	-5.9893	-6.9334	0.5400	0.9440	-122.8708	-118.5445	-0.0823	-0.0239
32.0	400	1.7405	-6.0454	-7.0112	0.5400	0.9658	-122.9573	-118.6068	-0.0827	-0.0247
40.0	500	1.7542	-6.0927	-7.0508	0.5500	0.9581	-123.0013	-118.6594	-0.0849	-0.0269
48.0	600	1.7457	-6.1288	-7.0751	0.5300	0.9463	-123.0282	-118.6995	-0.0843	-0.0262
56.0	700	1.7426	-6.1364	-7.0982	0.5400	0.9619	-123.0540	-118.7079	-0.0868	-0.0288
64.0	800	1.7365	-6.1361	-7.0983	0.5600	0.9621	-123.0540	-118.7077	-0.0867	-0.0287
72.0	900	1.7559	-6.1205	-7.0808	0.5500	0.9604	-123.0346	-118.6903	-0.0874	-0.0288
80.0	1000	1.7539	-6.1342	-7.0734	0.5500	0.9392	-123.0264	-118.7056	-0.0859	-0.0281