NeuralNovel leaderboard-pt-pr-bot commited on
Commit
b27e4fa
1 Parent(s): f87a539

Adding the Open Portuguese LLM Leaderboard Evaluation Results (#3)

Browse files

- Adding the Open Portuguese LLM Leaderboard Evaluation Results (16b9da221f21826674ba90f42651a47c099858b8)


Co-authored-by: Open PT LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +163 -1
README.md CHANGED
@@ -1,8 +1,8 @@
1
  ---
2
  license: other
 
3
  license_name: yi-license
4
  license_link: https://huggingface.co/01-ai/Yi-34B-200K/blob/main/LICENSE
5
- base_model: jondurbin/bagel-34b-v0.2
6
  model-index:
7
  - name: Luminex-34B-v0.1
8
  results:
@@ -106,6 +106,150 @@ model-index:
106
  source:
107
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ConvexAI/Luminex-34B-v0.1
108
  name: Open LLM Leaderboard
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
109
  ---
110
 
111
  ![image/png](https://i.ibb.co/9VB5SHL/OIG1-3.jpg)
@@ -137,3 +281,21 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
137
  |Winogrande (5-shot) |83.43|
138
  |GSM8k (5-shot) |72.48|
139
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: other
3
+ base_model: jondurbin/bagel-34b-v0.2
4
  license_name: yi-license
5
  license_link: https://huggingface.co/01-ai/Yi-34B-200K/blob/main/LICENSE
 
6
  model-index:
7
  - name: Luminex-34B-v0.1
8
  results:
 
106
  source:
107
  url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=ConvexAI/Luminex-34B-v0.1
108
  name: Open LLM Leaderboard
109
+ - task:
110
+ type: text-generation
111
+ name: Text Generation
112
+ dataset:
113
+ name: ENEM Challenge (No Images)
114
+ type: eduagarcia/enem_challenge
115
+ split: train
116
+ args:
117
+ num_few_shot: 3
118
+ metrics:
119
+ - type: acc
120
+ value: 72.01
121
+ name: accuracy
122
+ source:
123
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=ConvexAI/Luminex-34B-v0.1
124
+ name: Open Portuguese LLM Leaderboard
125
+ - task:
126
+ type: text-generation
127
+ name: Text Generation
128
+ dataset:
129
+ name: BLUEX (No Images)
130
+ type: eduagarcia-temp/BLUEX_without_images
131
+ split: train
132
+ args:
133
+ num_few_shot: 3
134
+ metrics:
135
+ - type: acc
136
+ value: 64.81
137
+ name: accuracy
138
+ source:
139
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=ConvexAI/Luminex-34B-v0.1
140
+ name: Open Portuguese LLM Leaderboard
141
+ - task:
142
+ type: text-generation
143
+ name: Text Generation
144
+ dataset:
145
+ name: OAB Exams
146
+ type: eduagarcia/oab_exams
147
+ split: train
148
+ args:
149
+ num_few_shot: 3
150
+ metrics:
151
+ - type: acc
152
+ value: 54.49
153
+ name: accuracy
154
+ source:
155
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=ConvexAI/Luminex-34B-v0.1
156
+ name: Open Portuguese LLM Leaderboard
157
+ - task:
158
+ type: text-generation
159
+ name: Text Generation
160
+ dataset:
161
+ name: Assin2 RTE
162
+ type: assin2
163
+ split: test
164
+ args:
165
+ num_few_shot: 15
166
+ metrics:
167
+ - type: f1_macro
168
+ value: 91.91
169
+ name: f1-macro
170
+ source:
171
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=ConvexAI/Luminex-34B-v0.1
172
+ name: Open Portuguese LLM Leaderboard
173
+ - task:
174
+ type: text-generation
175
+ name: Text Generation
176
+ dataset:
177
+ name: Assin2 STS
178
+ type: eduagarcia/portuguese_benchmark
179
+ split: test
180
+ args:
181
+ num_few_shot: 15
182
+ metrics:
183
+ - type: pearson
184
+ value: 81.31
185
+ name: pearson
186
+ source:
187
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=ConvexAI/Luminex-34B-v0.1
188
+ name: Open Portuguese LLM Leaderboard
189
+ - task:
190
+ type: text-generation
191
+ name: Text Generation
192
+ dataset:
193
+ name: FaQuAD NLI
194
+ type: ruanchaves/faquad-nli
195
+ split: test
196
+ args:
197
+ num_few_shot: 15
198
+ metrics:
199
+ - type: f1_macro
200
+ value: 82.27
201
+ name: f1-macro
202
+ source:
203
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=ConvexAI/Luminex-34B-v0.1
204
+ name: Open Portuguese LLM Leaderboard
205
+ - task:
206
+ type: text-generation
207
+ name: Text Generation
208
+ dataset:
209
+ name: HateBR Binary
210
+ type: ruanchaves/hatebr
211
+ split: test
212
+ args:
213
+ num_few_shot: 25
214
+ metrics:
215
+ - type: f1_macro
216
+ value: 69.84
217
+ name: f1-macro
218
+ source:
219
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=ConvexAI/Luminex-34B-v0.1
220
+ name: Open Portuguese LLM Leaderboard
221
+ - task:
222
+ type: text-generation
223
+ name: Text Generation
224
+ dataset:
225
+ name: PT Hate Speech Binary
226
+ type: hate_speech_portuguese
227
+ split: test
228
+ args:
229
+ num_few_shot: 25
230
+ metrics:
231
+ - type: f1_macro
232
+ value: 70.81
233
+ name: f1-macro
234
+ source:
235
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=ConvexAI/Luminex-34B-v0.1
236
+ name: Open Portuguese LLM Leaderboard
237
+ - task:
238
+ type: text-generation
239
+ name: Text Generation
240
+ dataset:
241
+ name: tweetSentBR
242
+ type: eduagarcia/tweetsentbr_fewshot
243
+ split: test
244
+ args:
245
+ num_few_shot: 25
246
+ metrics:
247
+ - type: f1_macro
248
+ value: 67.44
249
+ name: f1-macro
250
+ source:
251
+ url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=ConvexAI/Luminex-34B-v0.1
252
+ name: Open Portuguese LLM Leaderboard
253
  ---
254
 
255
  ![image/png](https://i.ibb.co/9VB5SHL/OIG1-3.jpg)
 
281
  |Winogrande (5-shot) |83.43|
282
  |GSM8k (5-shot) |72.48|
283
 
284
+
285
+ # Open Portuguese LLM Leaderboard Evaluation Results
286
+
287
+ Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/ConvexAI/Luminex-34B-v0.1) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
288
+
289
+ | Metric | Value |
290
+ |--------------------------|---------|
291
+ |Average |**72.76**|
292
+ |ENEM Challenge (No Images)| 72.01|
293
+ |BLUEX (No Images) | 64.81|
294
+ |OAB Exams | 54.49|
295
+ |Assin2 RTE | 91.91|
296
+ |Assin2 STS | 81.31|
297
+ |FaQuAD NLI | 82.27|
298
+ |HateBR Binary | 69.84|
299
+ |PT Hate Speech Binary | 70.81|
300
+ |tweetSentBR | 67.44|
301
+