Mistral-7B-v0.1-dpo-10k

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Rewards/real	Rewards/generated	Rewards/accuracies	Rewards/margins	Logps/generated	Logps/real	Logits/generated	Logits/real
0.74	0.1984	62	0.7414	1.1355	0.8829	0.6154	0.2526	-112.4863	-127.5589	-2.4229	-2.4711
0.7524	0.3968	124	0.7002	1.7305	1.2540	0.6923	0.4765	-108.7756	-121.6096	-2.5561	-2.5864
0.8028	0.5952	186	0.7025	1.7197	1.2525	0.6538	0.4673	-108.7909	-121.7167	-2.4102	-2.3984
0.7502	0.7936	248	0.7088	1.5388	0.9514	0.6346	0.5875	-111.8017	-123.5257	-2.5032	-2.5135
0.8621	0.992	310	0.7444	1.5171	1.1213	0.6731	0.3957	-110.1023	-123.7435	-2.4965	-2.5022
0.3246	1.1904	372	0.7215	2.3223	1.7036	0.6731	0.6187	-104.2799	-115.6916	-2.5671	-2.5848
0.3153	1.3888	434	0.7150	2.3474	1.7021	0.6538	0.6453	-104.2945	-115.4398	-2.4999	-2.5255
0.4053	1.5872	496	0.7083	2.2991	1.6619	0.6731	0.6372	-104.6970	-115.9233	-2.4039	-2.4069
0.3611	1.7856	558	0.7119	2.3331	1.7045	0.6731	0.6286	-104.2702	-115.5829	-2.4323	-2.4364
0.3933	1.984	620	0.7168	2.3292	1.7024	0.6731	0.6268	-104.2917	-115.6223	-2.4321	-2.4267
0.226	2.1824	682	0.7430	2.2194	1.4536	0.6346	0.7658	-106.7797	-116.7200	-2.3994	-2.4211
0.2117	2.3808	744	0.7449	2.1435	1.3976	0.5962	0.7459	-107.3397	-117.4795	-2.4077	-2.4527
0.2304	2.5792	806	0.7553	2.2242	1.4834	0.5769	0.7408	-106.4812	-116.6720	-2.3411	-2.3926
0.2423	2.7776	868	0.7526	2.2896	1.5597	0.5962	0.7299	-105.7187	-116.0179	-2.3574	-2.3974
0.2881	2.976	930	0.7523	2.2447	1.4806	0.6154	0.7641	-106.5099	-116.4675	-2.3563	-2.3976