v3_mistral_lora

This model is a fine-tuned version of peiyi9979/math-shepherd-mistral-7b-prm on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.0001
Accuracy: 1.0
Precision: 1.0
Recall: 1.0
F1: 1.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 8569382
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Precision	Recall	F1
No log	0	0	0.4941	0.7546	0.6364	0.2526	0.3616
0.483	0.0169	20	0.4898	0.7645	0.7	0.2526	0.3712
0.5491	0.0339	40	0.4646	0.7716	0.7705	0.2423	0.3686
0.3868	0.0508	60	0.3927	0.8014	0.8462	0.3402	0.4853
0.2752	0.0678	80	0.2430	0.9149	0.9589	0.7216	0.8235
0.1319	0.0847	100	0.0990	0.9716	0.9531	0.9433	0.9482
0.0971	0.1017	120	0.0422	0.9915	0.9747	0.9948	0.9847
0.0478	0.1186	140	0.0120	0.9986	1.0	0.9948	0.9974
0.0373	0.1356	160	0.0099	0.9986	1.0	0.9948	0.9974
0.0357	0.1525	180	0.0073	0.9972	0.9948	0.9948	0.9948
0.0147	0.1695	200	0.0105	0.9986	1.0	0.9948	0.9974
0.0271	0.1864	220	0.0075	0.9986	1.0	0.9948	0.9974
0.0071	0.2034	240	0.0073	0.9986	1.0	0.9948	0.9974
0.009	0.2203	260	0.0021	1.0	1.0	1.0	1.0
0.0288	0.2373	280	0.0015	1.0	1.0	1.0	1.0
0.0236	0.2542	300	0.0011	1.0	1.0	1.0	1.0
0.0053	0.2712	320	0.0008	1.0	1.0	1.0	1.0
0.0028	0.2881	340	0.0004	1.0	1.0	1.0	1.0
0.015	0.3051	360	0.0004	1.0	1.0	1.0	1.0
0.0446	0.3220	380	0.0006	1.0	1.0	1.0	1.0
0.0261	0.3390	400	0.0003	1.0	1.0	1.0	1.0
0.0032	0.3559	420	0.0002	1.0	1.0	1.0	1.0
0.0413	0.3729	440	0.0002	1.0	1.0	1.0	1.0
0.0189	0.3898	460	0.0004	1.0	1.0	1.0	1.0
0.003	0.4068	480	0.0002	1.0	1.0	1.0	1.0
0.0071	0.4237	500	0.0004	1.0	1.0	1.0	1.0
0.0139	0.4407	520	0.0005	1.0	1.0	1.0	1.0
0.0161	0.4576	540	0.0003	1.0	1.0	1.0	1.0
0.0027	0.4746	560	0.0002	1.0	1.0	1.0	1.0
0.0039	0.4915	580	0.0003	1.0	1.0	1.0	1.0
0.0067	0.5085	600	0.0001	1.0	1.0	1.0	1.0
0.012	0.5254	620	0.0001	1.0	1.0	1.0	1.0
0.006	0.5424	640	0.0001	1.0	1.0	1.0	1.0
0.0025	0.5593	660	0.0001	1.0	1.0	1.0	1.0
0.0055	0.5763	680	0.0001	1.0	1.0	1.0	1.0
0.0116	0.5932	700	0.0001	1.0	1.0	1.0	1.0
0.014	0.6102	720	0.0001	1.0	1.0	1.0	1.0
0.0042	0.6271	740	0.0001	1.0	1.0	1.0	1.0
0.0418	0.6441	760	0.0003	1.0	1.0	1.0	1.0
0.0024	0.6610	780	0.0002	1.0	1.0	1.0	1.0
0.0039	0.6780	800	0.0002	1.0	1.0	1.0	1.0
0.0048	0.6949	820	0.0001	1.0	1.0	1.0	1.0
0.0007	0.7119	840	0.0001	1.0	1.0	1.0	1.0
0.0014	0.7288	860	0.0001	1.0	1.0	1.0	1.0
0.0056	0.7458	880	0.0001	1.0	1.0	1.0	1.0
0.0107	0.7627	900	0.0001	1.0	1.0	1.0	1.0
0.0027	0.7797	920	0.0001	1.0	1.0	1.0	1.0
0.0105	0.7966	940	0.0001	1.0	1.0	1.0	1.0
0.0157	0.8136	960	0.0001	1.0	1.0	1.0	1.0
0.0082	0.8305	980	0.0001	1.0	1.0	1.0	1.0
0.0084	0.8475	1000	0.0001	1.0	1.0	1.0	1.0
0.0182	0.8644	1020	0.0001	1.0	1.0	1.0	1.0
0.0053	0.8814	1040	0.0001	1.0	1.0	1.0	1.0
0.0087	0.8983	1060	0.0001	1.0	1.0	1.0	1.0
0.0017	0.9153	1080	0.0001	1.0	1.0	1.0	1.0
0.0058	0.9322	1100	0.0001	1.0	1.0	1.0	1.0
0.0015	0.9492	1120	0.0001	1.0	1.0	1.0	1.0
0.0059	0.9661	1140	0.0001	1.0	1.0	1.0	1.0
0.0069	0.9831	1160	0.0001	1.0	1.0	1.0	1.0
0.0058	1.0	1180	0.0001	1.0	1.0	1.0	1.0

Framework versions

PEFT 0.13.2
Transformers 4.46.0
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.20.3

mtzig
/

v3_mistral_lora

v3_mistral_lora

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for mtzig/v3_mistral_lora

Evaluation results