Adding Evaluation Results

#2
Files changed (1) hide show
  1. README.md +135 -15
README.md CHANGED
@@ -1,10 +1,12 @@
1
  ---
2
- license: apache-2.0
3
- datasets:
4
- - Locutusque/hercules-v1.0
5
  language:
6
  - en
 
 
 
7
  base_model: M4-ai/TinyMistral-6x248M
 
 
8
  inference:
9
  parameters:
10
  do_sample: true
@@ -14,21 +16,126 @@ inference:
14
  max_new_tokens: 250
15
  repetition_penalty: 1.1
16
  widget:
17
- - text: |
18
- <|im_start|>user
19
  Write me a Python program that calculates the factorial of n. <|im_end|>
 
20
  <|im_start|>assistant
21
- - text: >-
22
- An emerging clinical approach to treat substance abuse disorders involves a
23
- form of cognitive-behavioral therapy whereby addicts learn to reduce their
24
- reactivity to drug-paired stimuli through cue-exposure or extinction
25
- training. It is, however,
26
- - text: |
27
- <|im_start|>user
28
  How do I say hello in Spanish? <|im_end|>
 
29
  <|im_start|>assistant
30
- tags:
31
- - moe
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  ---
33
  # Model Card for M4-ai/TinyMistral-6x248M-Instruct
34
 
@@ -95,4 +202,17 @@ The model has been fine-tuned on the hercules-v1.0 dataset, which contains conte
95
 
96
  ## Contributions
97
 
98
- Thanks to @jtatman, @aloobun, @Felladrin, and @Locutusque for their contributions to this model.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
2
  language:
3
  - en
4
+ license: apache-2.0
5
+ tags:
6
+ - moe
7
  base_model: M4-ai/TinyMistral-6x248M
8
+ datasets:
9
+ - Locutusque/hercules-v1.0
10
  inference:
11
  parameters:
12
  do_sample: true
 
16
  max_new_tokens: 250
17
  repetition_penalty: 1.1
18
  widget:
19
+ - text: '<|im_start|>user
20
+
21
  Write me a Python program that calculates the factorial of n. <|im_end|>
22
+
23
  <|im_start|>assistant
24
+
25
+ '
26
+ - text: An emerging clinical approach to treat substance abuse disorders involves
27
+ a form of cognitive-behavioral therapy whereby addicts learn to reduce their reactivity
28
+ to drug-paired stimuli through cue-exposure or extinction training. It is, however,
29
+ - text: '<|im_start|>user
30
+
31
  How do I say hello in Spanish? <|im_end|>
32
+
33
  <|im_start|>assistant
34
+
35
+ '
36
+ model-index:
37
+ - name: TinyMistral-6x248M-Instruct
38
+ results:
39
+ - task:
40
+ type: text-generation
41
+ name: Text Generation
42
+ dataset:
43
+ name: AI2 Reasoning Challenge (25-Shot)
44
+ type: ai2_arc
45
+ config: ARC-Challenge
46
+ split: test
47
+ args:
48
+ num_few_shot: 25
49
+ metrics:
50
+ - type: acc_norm
51
+ value: 22.44
52
+ name: normalized accuracy
53
+ source:
54
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=M4-ai/TinyMistral-6x248M-Instruct
55
+ name: Open LLM Leaderboard
56
+ - task:
57
+ type: text-generation
58
+ name: Text Generation
59
+ dataset:
60
+ name: HellaSwag (10-Shot)
61
+ type: hellaswag
62
+ split: validation
63
+ args:
64
+ num_few_shot: 10
65
+ metrics:
66
+ - type: acc_norm
67
+ value: 27.02
68
+ name: normalized accuracy
69
+ source:
70
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=M4-ai/TinyMistral-6x248M-Instruct
71
+ name: Open LLM Leaderboard
72
+ - task:
73
+ type: text-generation
74
+ name: Text Generation
75
+ dataset:
76
+ name: MMLU (5-Shot)
77
+ type: cais/mmlu
78
+ config: all
79
+ split: test
80
+ args:
81
+ num_few_shot: 5
82
+ metrics:
83
+ - type: acc
84
+ value: 24.13
85
+ name: accuracy
86
+ source:
87
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=M4-ai/TinyMistral-6x248M-Instruct
88
+ name: Open LLM Leaderboard
89
+ - task:
90
+ type: text-generation
91
+ name: Text Generation
92
+ dataset:
93
+ name: TruthfulQA (0-shot)
94
+ type: truthful_qa
95
+ config: multiple_choice
96
+ split: validation
97
+ args:
98
+ num_few_shot: 0
99
+ metrics:
100
+ - type: mc2
101
+ value: 43.16
102
+ source:
103
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=M4-ai/TinyMistral-6x248M-Instruct
104
+ name: Open LLM Leaderboard
105
+ - task:
106
+ type: text-generation
107
+ name: Text Generation
108
+ dataset:
109
+ name: Winogrande (5-shot)
110
+ type: winogrande
111
+ config: winogrande_xl
112
+ split: validation
113
+ args:
114
+ num_few_shot: 5
115
+ metrics:
116
+ - type: acc
117
+ value: 50.59
118
+ name: accuracy
119
+ source:
120
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=M4-ai/TinyMistral-6x248M-Instruct
121
+ name: Open LLM Leaderboard
122
+ - task:
123
+ type: text-generation
124
+ name: Text Generation
125
+ dataset:
126
+ name: GSM8k (5-shot)
127
+ type: gsm8k
128
+ config: main
129
+ split: test
130
+ args:
131
+ num_few_shot: 5
132
+ metrics:
133
+ - type: acc
134
+ value: 0.0
135
+ name: accuracy
136
+ source:
137
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=M4-ai/TinyMistral-6x248M-Instruct
138
+ name: Open LLM Leaderboard
139
  ---
140
  # Model Card for M4-ai/TinyMistral-6x248M-Instruct
141
 
 
202
 
203
  ## Contributions
204
 
205
+ Thanks to @jtatman, @aloobun, @Felladrin, and @Locutusque for their contributions to this model.
206
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
207
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_M4-ai__TinyMistral-6x248M-Instruct)
208
+
209
+ | Metric |Value|
210
+ |---------------------------------|----:|
211
+ |Avg. |27.89|
212
+ |AI2 Reasoning Challenge (25-Shot)|22.44|
213
+ |HellaSwag (10-Shot) |27.02|
214
+ |MMLU (5-Shot) |24.13|
215
+ |TruthfulQA (0-shot) |43.16|
216
+ |Winogrande (5-shot) |50.59|
217
+ |GSM8k (5-shot) | 0.00|
218
+