Qwen2-7B-GGUF / Qwen2-7B-Q2_K.mmlu.pro.txt
fedric95's picture
Upload ./Qwen2-7B-Q2_K.mmlu.pro.txt with huggingface_hub
b1a6865 verified
raw
history blame
1.43 kB
multiple_choice_score: there are 70 tasks in prompt
multiple_choice_score: reading tasks......................................................................done
multiple_choice_score: preparing task data......................................................................done
multiple_choice_score : calculating TruthfulQA score over 70 tasks.
task acc_norm
1 0.00000000
2 0.00000000
3 0.00000000
4 0.00000000
5 0.00000000
6 0.00000000
7 0.00000000
8 0.00000000
9 0.00000000
10 0.00000000
11 0.00000000
12 0.00000000
13 0.00000000
14 0.00000000
15 6.66666667
16 6.25000000
17 5.88235294
18 5.55555556
19 5.26315789
20 5.00000000
21 4.76190476
22 4.54545455
23 4.34782609
24 4.16666667
25 4.00000000
26 3.84615385
27 3.70370370
28 3.57142857
29 3.44827586
30 6.66666667
31 6.45161290
32 6.25000000
33 9.09090909
34 8.82352941
35 8.57142857
36 8.33333333
37 8.10810811
38 7.89473684
39 7.69230769
40 7.50000000
41 7.31707317
42 7.14285714
43 6.97674419
44 6.81818182
45 6.66666667
46 8.69565217
47 8.51063830
48 10.41666667
49 10.20408163
50 12.00000000
51 11.76470588
52 11.53846154
53 13.20754717
54 14.81481481
55 16.36363636
56 16.07142857
57 15.78947368
58 15.51724138
59 15.25423729
60 15.00000000
61 14.75409836
62 14.51612903
63 14.28571429
64 15.62500000
65 15.38461538
66 15.15151515
67 14.92537313
68 14.70588235
69 14.49275362
70 14.28571429
Final result: 14.2857 +/- 4.2126
Random chance: 10.0000 +/- 3.6116