[Feedback welcome] Add evaluation results to model card metadata

#40
by Wauplin HF staff - opened
Files changed (1) hide show
  1. README.md +161 -4
README.md CHANGED
@@ -1,9 +1,6 @@
1
  ---
2
  tags:
3
  - generated_from_trainer
4
- model-index:
5
- - name: zephyr-7b-beta
6
- results: []
7
  license: mit
8
  datasets:
9
  - HuggingFaceH4/ultrachat_200k
@@ -16,6 +13,166 @@ widget:
16
  output:
17
  text: "Arr! 'Tis a puzzlin' matter, me hearty! A llama on yer lawn be a rare sight, but I've got a plan that might help ye get rid of 'im. Ye'll need to gather some carrots and hay, and then lure the llama away with the promise of a tasty treat. Once he's gone, ye can clean up yer lawn and enjoy the peace and quiet once again. But beware, me hearty, for there may be more llamas where that one came from! Arr!"
18
  pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  ---
20
 
21
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -250,4 +407,4 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
250
  | TruthfulQA (0-shot) | 57.45 |
251
  | Winogrande (5-shot) | 77.74 |
252
  | GSM8K (5-shot) | 12.74 |
253
- | DROP (3-shot) | 9.66 |
 
1
  ---
2
  tags:
3
  - generated_from_trainer
 
 
 
4
  license: mit
5
  datasets:
6
  - HuggingFaceH4/ultrachat_200k
 
13
  output:
14
  text: "Arr! 'Tis a puzzlin' matter, me hearty! A llama on yer lawn be a rare sight, but I've got a plan that might help ye get rid of 'im. Ye'll need to gather some carrots and hay, and then lure the llama away with the promise of a tasty treat. Once he's gone, ye can clean up yer lawn and enjoy the peace and quiet once again. But beware, me hearty, for there may be more llamas where that one came from! Arr!"
15
  pipeline_tag: text-generation
16
+ model-index:
17
+ - name: zephyr-7b-beta
18
+ results:
19
+ # AI2 Reasoning Challenge (25-Shot)
20
+ - task:
21
+ type: text-generation
22
+ name: Text Generation
23
+ dataset:
24
+ name: AI2 Reasoning Challenge (25-Shot)
25
+ type: ai2_arc
26
+ config: ARC-Challenge
27
+ split: test
28
+ args:
29
+ num_few_shot: 25
30
+ metrics:
31
+ - type: acc_norm
32
+ name: normalized accuracy
33
+ value: 0.6203071672354948
34
+ source:
35
+ name: Open LLM Leaderboard
36
+ url: https://huggingface.co/datasets/open-llm-leaderboard/details_HuggingFaceH4__zephyr-7b-beta_public
37
+
38
+ # HellaSwag (10-shot)
39
+ - task:
40
+ type: text-generation
41
+ name: Text Generation
42
+ dataset:
43
+ name: HellaSwag (10-Shot)
44
+ type: hellaswag
45
+ split: validation
46
+ args:
47
+ num_few_shot: 10
48
+ metrics:
49
+ - type: acc_norm
50
+ name: normalized accuracy
51
+ value: 0.8435570603465445
52
+ source:
53
+ name: Open LLM Leaderboard
54
+ url: https://huggingface.co/datasets/open-llm-leaderboard/details_HuggingFaceH4__zephyr-7b-beta_public
55
+
56
+ # DROP (3-shot)
57
+ - task:
58
+ type: text-generation
59
+ name: Text Generation
60
+ dataset:
61
+ name: Drop (3-Shot)
62
+ type: drop
63
+ split: validation
64
+ args:
65
+ num_few_shot: 3
66
+ metrics:
67
+ - type: f1
68
+ name: f1 score
69
+ value: 0.09662437080536909
70
+ source:
71
+ name: Open LLM Leaderboard
72
+ url: https://huggingface.co/datasets/open-llm-leaderboard/details_HuggingFaceH4__zephyr-7b-beta_public
73
+
74
+ # TruthfulQA (0-shot)
75
+ - task:
76
+ type: text-generation
77
+ name: Text Generation
78
+ dataset:
79
+ name: TruthfulQA (0-shot)
80
+ type: truthful_qa
81
+ config: multiple_choice
82
+ split: validation
83
+ args:
84
+ num_few_shot: 0
85
+ metrics:
86
+ - type: mc2
87
+ value: 0.5744916942762855
88
+ source:
89
+ name: Open LLM Leaderboard
90
+ url: https://huggingface.co/datasets/open-llm-leaderboard/details_HuggingFaceH4__zephyr-7b-beta_public
91
+
92
+ # GSM8k (5-shot)
93
+ - task:
94
+ type: text-generation
95
+ name: Text Generation
96
+ dataset:
97
+ name: GSM8k (5-shot)
98
+ type: gsm8k
99
+ config: main
100
+ split: test
101
+ args:
102
+ num_few_shot: 5
103
+ metrics:
104
+ - type: acc
105
+ name: accuracy
106
+ value: 0.12736921910538287
107
+ source:
108
+ name: Open LLM Leaderboard
109
+ url: https://huggingface.co/datasets/open-llm-leaderboard/details_HuggingFaceH4__zephyr-7b-beta_public
110
+
111
+ # MMLU (5-Shot)
112
+ - task:
113
+ type: text-generation
114
+ name: Text Generation
115
+ dataset:
116
+ name: MMLU (5-Shot)
117
+ type: cais/mmlu
118
+ config: all
119
+ split: test
120
+ args:
121
+ num_few_shot: 5
122
+ metrics:
123
+ - type: acc
124
+ name: accuracy
125
+ value: 0.6107
126
+ source:
127
+ name: Open LLM Leaderboard
128
+ url: https://huggingface.co/datasets/open-llm-leaderboard/details_HuggingFaceH4__zephyr-7b-beta_public
129
+
130
+ # Winogrande (5-shot)
131
+ - task:
132
+ type: text-generation
133
+ name: Text Generation
134
+ dataset:
135
+ name: Winogrande (5-shot)
136
+ type: winogrande
137
+ config: winogrande_xl
138
+ split: validation
139
+ args:
140
+ num_few_shot: 5
141
+ metrics:
142
+ - type: acc
143
+ name: accuracy
144
+ value: 0.7774269928966061
145
+ source:
146
+ name: Open LLM Leaderboard
147
+ url: https://huggingface.co/datasets/open-llm-leaderboard/details_HuggingFaceH4__zephyr-7b-beta_public
148
+
149
+ # AlpacaEval (taken from model card)
150
+ - task:
151
+ type: text-generation
152
+ name: Text Generation
153
+ dataset:
154
+ name: AlpacaEval
155
+ type: tatsu-lab/alpaca_eval
156
+ metrics:
157
+ - type: unknown
158
+ name: win rate
159
+ value: 0.9060
160
+ source:
161
+ url: https://tatsu-lab.github.io/alpaca_eval/
162
+
163
+ # MT-Bench (taken from model card)
164
+ - task:
165
+ type: text-generation
166
+ name: Text Generation
167
+ dataset:
168
+ name: MT-Bench
169
+ type: unknown
170
+ metrics:
171
+ - type: unknown
172
+ name: score
173
+ value: 7.34
174
+ source:
175
+ url: https://huggingface.co/spaces/lmsys/mt-bench
176
  ---
177
 
178
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
407
  | TruthfulQA (0-shot) | 57.45 |
408
  | Winogrande (5-shot) | 77.74 |
409
  | GSM8K (5-shot) | 12.74 |
410
+ | DROP (3-shot) | 9.66 |