Qwen2-7B-GGUF / Qwen2-7B-Q2_K.mmlu.pro.txt

Upload ./Qwen2-7B-Q2_K.mmlu.pro.txt with huggingface_hub

b1a6865 verified 6 months ago

1.43 kB

	multiple_choice_score: there are 70 tasks in prompt
	multiple_choice_score: reading tasks......................................................................done
	multiple_choice_score: preparing task data......................................................................done
	multiple_choice_score : calculating TruthfulQA score over 70 tasks.

	task acc_norm
	1 0.00000000
	2 0.00000000
	3 0.00000000
	4 0.00000000
	5 0.00000000
	6 0.00000000
	7 0.00000000
	8 0.00000000
	9 0.00000000
	10 0.00000000
	11 0.00000000
	12 0.00000000
	13 0.00000000
	14 0.00000000
	15 6.66666667
	16 6.25000000
	17 5.88235294
	18 5.55555556
	19 5.26315789
	20 5.00000000
	21 4.76190476
	22 4.54545455
	23 4.34782609
	24 4.16666667
	25 4.00000000
	26 3.84615385
	27 3.70370370
	28 3.57142857
	29 3.44827586
	30 6.66666667
	31 6.45161290
	32 6.25000000
	33 9.09090909
	34 8.82352941
	35 8.57142857
	36 8.33333333
	37 8.10810811
	38 7.89473684
	39 7.69230769
	40 7.50000000
	41 7.31707317
	42 7.14285714
	43 6.97674419
	44 6.81818182
	45 6.66666667
	46 8.69565217
	47 8.51063830
	48 10.41666667
	49 10.20408163
	50 12.00000000
	51 11.76470588
	52 11.53846154
	53 13.20754717
	54 14.81481481
	55 16.36363636
	56 16.07142857
	57 15.78947368
	58 15.51724138
	59 15.25423729
	60 15.00000000
	61 14.75409836
	62 14.51612903
	63 14.28571429
	64 15.62500000
	65 15.38461538
	66 15.15151515
	67 14.92537313
	68 14.70588235
	69 14.49275362
	70 14.28571429

	Final result: 14.2857 +/- 4.2126
	Random chance: 10.0000 +/- 3.6116