macadeliccc commited on
Commit
8bd0c89
·
verified ·
1 Parent(s): f72f3cc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -1
README.md CHANGED
@@ -56,7 +56,86 @@ print(generate_response(prompt), "\n")
56
 
57
  ## Evaluations
58
 
59
- TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
 
61
 
62
  ### 📚 Citations
 
56
 
57
  ## Evaluations
58
 
59
+ | Model |AGIEval|GPT4All|TruthfulQA|Bigbench|Average|
60
+ |---------------------------------------------------------------------------|------:|------:|---------:|-------:|------:|
61
+ |[SOLAR-math-2x10.7b](https://huggingface.co/macadeliccc/SOLAR-math-2x10.7b)| 47.2| 75.18| 64.73| 45.15| 58.07|
62
+
63
+ ### AGIEval
64
+ | Task |Version| Metric |Value| |Stderr|
65
+ |------------------------------|------:|--------|----:|---|-----:|
66
+ |agieval_aqua_rat | 0|acc |30.31|± | 2.89|
67
+ | | |acc_norm|30.31|± | 2.89|
68
+ |agieval_logiqa_en | 0|acc |43.78|± | 1.95|
69
+ | | |acc_norm|43.93|± | 1.95|
70
+ |agieval_lsat_ar | 0|acc |21.74|± | 2.73|
71
+ | | |acc_norm|19.13|± | 2.60|
72
+ |agieval_lsat_lr | 0|acc |57.25|± | 2.19|
73
+ | | |acc_norm|56.47|± | 2.20|
74
+ |agieval_lsat_rc | 0|acc |68.77|± | 2.83|
75
+ | | |acc_norm|68.03|± | 2.85|
76
+ |agieval_sat_en | 0|acc |78.16|± | 2.89|
77
+ | | |acc_norm|79.13|± | 2.84|
78
+ |agieval_sat_en_without_passage| 0|acc |47.57|± | 3.49|
79
+ | | |acc_norm|44.66|± | 3.47|
80
+ |agieval_sat_math | 0|acc |41.36|± | 3.33|
81
+ | | |acc_norm|35.91|± | 3.24|
82
+
83
+ Average: 47.2%
84
+
85
+ ### GPT4All
86
+ | Task |Version| Metric |Value| |Stderr|
87
+ |-------------|------:|--------|----:|---|-----:|
88
+ |arc_challenge| 0|acc |59.22|± | 1.44|
89
+ | | |acc_norm|61.43|± | 1.42|
90
+ |arc_easy | 0|acc |84.26|± | 0.75|
91
+ | | |acc_norm|83.63|± | 0.76|
92
+ |boolq | 1|acc |88.69|± | 0.55|
93
+ |hellaswag | 0|acc |65.98|± | 0.47|
94
+ | | |acc_norm|84.29|± | 0.36|
95
+ |openbookqa | 0|acc |34.20|± | 2.12|
96
+ | | |acc_norm|47.20|± | 2.23|
97
+ |piqa | 0|acc |81.83|± | 0.90|
98
+ | | |acc_norm|82.59|± | 0.88|
99
+ |winogrande | 0|acc |78.45|± | 1.16|
100
+
101
+ Average: 75.18%
102
+
103
+ ### TruthfulQA
104
+ | Task |Version|Metric|Value| |Stderr|
105
+ |-------------|------:|------|----:|---|-----:|
106
+ |truthfulqa_mc| 1|mc1 |48.47|± | 1.75|
107
+ | | |mc2 |64.73|± | 1.53|
108
+
109
+ Average: 64.73%
110
+
111
+ ### Bigbench
112
+ | Task |Version| Metric |Value| |Stderr|
113
+ |------------------------------------------------|------:|---------------------|----:|---|-----:|
114
+ |bigbench_causal_judgement | 0|multiple_choice_grade|61.05|± | 3.55|
115
+ |bigbench_date_understanding | 0|multiple_choice_grade|68.56|± | 2.42|
116
+ |bigbench_disambiguation_qa | 0|multiple_choice_grade|35.27|± | 2.98|
117
+ |bigbench_geometric_shapes | 0|multiple_choice_grade|31.20|± | 2.45|
118
+ | | |exact_str_match | 0.00|± | 0.00|
119
+ |bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|30.00|± | 2.05|
120
+ |bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|23.43|± | 1.60|
121
+ |bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|46.00|± | 2.88|
122
+ |bigbench_movie_recommendation | 0|multiple_choice_grade|35.60|± | 2.14|
123
+ |bigbench_navigate | 0|multiple_choice_grade|57.50|± | 1.56|
124
+ |bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|55.80|± | 1.11|
125
+ |bigbench_ruin_names | 0|multiple_choice_grade|45.98|± | 2.36|
126
+ |bigbench_salient_translation_error_detection | 0|multiple_choice_grade|40.58|± | 1.56|
127
+ |bigbench_snarks | 0|multiple_choice_grade|66.85|± | 3.51|
128
+ |bigbench_sports_understanding | 0|multiple_choice_grade|71.40|± | 1.44|
129
+ |bigbench_temporal_sequences | 0|multiple_choice_grade|56.40|± | 1.57|
130
+ |bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|24.00|± | 1.21|
131
+ |bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|17.09|± | 0.90|
132
+ |bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|46.00|± | 2.88|
133
+
134
+ Average: 45.15%
135
+
136
+ Average score: 58.07%
137
+
138
+ Elapsed time: 04:05:27
139
 
140
 
141
  ### 📚 Citations