Felladrin leaderboard-pr-bot commited on
Commit
180d584
·
verified ·
1 Parent(s): f1b293c

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (bbfabe136197aafa2bc0da3ac6eb441ca3b03ccf)
- Update README.md (e375c4487c7bdc1253ed7babee81955da66dd25b)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +164 -45
README.md CHANGED
@@ -1,58 +1,163 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
- - en
 
5
  tags:
6
- - text-generation
7
- base_model: JackFram/llama-68m
8
  datasets:
9
- - THUDM/webglm-qa
10
- - databricks/databricks-dolly-15k
11
- - cognitivecomputations/wizard_vicuna_70k_unfiltered
12
- - totally-not-an-llm/EverythingLM-data-V3
13
- - Amod/mental_health_counseling_conversations
14
- - sablo/oasst2_curated
15
- - starfishmedical/webGPT_x_dolly
16
- - Open-Orca/OpenOrca
17
- - mlabonne/chatml_dpo_pairs
 
18
  widget:
19
- - text: |-
20
- <|im_start|>system
21
- You are a knowledgeable assistant. Help the user as much as you can.<|im_end|>
22
- <|im_start|>user
23
- How to become healthier?<|im_end|>
24
- <|im_start|>assistant
25
- - text: |-
26
- <|im_start|>system
27
- You are a career counselor. The user will provide you with an individual looking for guidance in their professional life, and your task is to assist them in determining what careers they are most suited for based on their skills, interests, and experience. You should also conduct research into the various options available, explain the job market trends in different industries, and advice on which qualifications would be beneficial for pursuing particular fields.<|im_end|>
28
- <|im_start|>user
29
- Heya!<|im_end|>
30
- <|im_start|>assistant
31
- Hi! How may I help you?<|im_end|>
32
- <|im_start|>user
33
- I am interested in developing a career in software engineering. What would you recommend me to do?<|im_end|>
34
- <|im_start|>assistant
35
- - text: |-
36
- <|im_start|>system
37
- You are a helpful assistant who provides concise responses.<|im_end|>
38
- <|im_start|>user
39
- Hi!<|im_end|>
40
- <|im_start|>assistant
41
- Hello there! How may I help you?<|im_end|>
42
- <|im_start|>user
43
- I need to build a simple website. Where should I start learning about web development?<|im_end|>
44
- <|im_start|>assistant
45
- - text: |-
46
- <|im_start|>system
47
- You are a very creative assistant. User will give you a task, which you should complete with all your knowledge.<|im_end|>
48
- <|im_start|>user
49
- Write the background story of an RPG game about wizards and dragons in a sci-fi world.<|im_end|>
50
- <|im_start|>assistant
 
 
51
  inference:
52
  parameters:
53
  max_new_tokens: 64
54
  penalty_alpha: 0.5
55
  top_k: 4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  ---
57
 
58
  # A Llama Chat Model of 68M Parameters
@@ -88,3 +193,17 @@ inference:
88
  penalty_alpha: 0.5
89
  top_k: 4
90
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
+ - en
4
+ license: apache-2.0
5
  tags:
6
+ - text-generation
 
7
  datasets:
8
+ - THUDM/webglm-qa
9
+ - databricks/databricks-dolly-15k
10
+ - cognitivecomputations/wizard_vicuna_70k_unfiltered
11
+ - totally-not-an-llm/EverythingLM-data-V3
12
+ - Amod/mental_health_counseling_conversations
13
+ - sablo/oasst2_curated
14
+ - starfishmedical/webGPT_x_dolly
15
+ - Open-Orca/OpenOrca
16
+ - mlabonne/chatml_dpo_pairs
17
+ base_model: JackFram/llama-68m
18
  widget:
19
+ - messages:
20
+ - role: system
21
+ content: You are a career counselor. The user will provide you with an individual
22
+ looking for guidance in their professional life, and your task is to assist
23
+ them in determining what careers they are most suited for based on their skills,
24
+ interests, and experience. You should also conduct research into the various
25
+ options available, explain the job market trends in different industries, and
26
+ advice on which qualifications would be beneficial for pursuing particular fields.
27
+ - role: user
28
+ content: Heya!
29
+ - role: assistant
30
+ content: Hi! How may I help you?
31
+ - role: user
32
+ content: I am interested in developing a career in software engineering. What
33
+ would you recommend me to do?
34
+ - messages:
35
+ - role: system
36
+ content: You are a knowledgeable assistant. Help the user as much as you can.
37
+ - role: user
38
+ content: How to become healthier?
39
+ - messages:
40
+ - role: system
41
+ content: You are a helpful assistant who provides concise responses.
42
+ - role: user
43
+ content: Hi!
44
+ - role: assistant
45
+ content: Hello there! How may I help you?
46
+ - role: user
47
+ content: I need to build a simple website. Where should I start learning about web development?
48
+ - messages:
49
+ - role: system
50
+ content: You are a very creative assistant. User will give you a task, which you should complete with all your knowledge.
51
+ - role: user
52
+ content: Write the background story of an RPG game about wizards and dragons in a sci-fi world.
53
  inference:
54
  parameters:
55
  max_new_tokens: 64
56
  penalty_alpha: 0.5
57
  top_k: 4
58
+ model-index:
59
+ - name: Llama-68M-Chat-v1
60
+ results:
61
+ - task:
62
+ type: text-generation
63
+ name: Text Generation
64
+ dataset:
65
+ name: AI2 Reasoning Challenge (25-Shot)
66
+ type: ai2_arc
67
+ config: ARC-Challenge
68
+ split: test
69
+ args:
70
+ num_few_shot: 25
71
+ metrics:
72
+ - type: acc_norm
73
+ value: 23.29
74
+ name: normalized accuracy
75
+ source:
76
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Llama-68M-Chat-v1
77
+ name: Open LLM Leaderboard
78
+ - task:
79
+ type: text-generation
80
+ name: Text Generation
81
+ dataset:
82
+ name: HellaSwag (10-Shot)
83
+ type: hellaswag
84
+ split: validation
85
+ args:
86
+ num_few_shot: 10
87
+ metrics:
88
+ - type: acc_norm
89
+ value: 28.27
90
+ name: normalized accuracy
91
+ source:
92
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Llama-68M-Chat-v1
93
+ name: Open LLM Leaderboard
94
+ - task:
95
+ type: text-generation
96
+ name: Text Generation
97
+ dataset:
98
+ name: MMLU (5-Shot)
99
+ type: cais/mmlu
100
+ config: all
101
+ split: test
102
+ args:
103
+ num_few_shot: 5
104
+ metrics:
105
+ - type: acc
106
+ value: 25.18
107
+ name: accuracy
108
+ source:
109
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Llama-68M-Chat-v1
110
+ name: Open LLM Leaderboard
111
+ - task:
112
+ type: text-generation
113
+ name: Text Generation
114
+ dataset:
115
+ name: TruthfulQA (0-shot)
116
+ type: truthful_qa
117
+ config: multiple_choice
118
+ split: validation
119
+ args:
120
+ num_few_shot: 0
121
+ metrics:
122
+ - type: mc2
123
+ value: 47.27
124
+ source:
125
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Llama-68M-Chat-v1
126
+ name: Open LLM Leaderboard
127
+ - task:
128
+ type: text-generation
129
+ name: Text Generation
130
+ dataset:
131
+ name: Winogrande (5-shot)
132
+ type: winogrande
133
+ config: winogrande_xl
134
+ split: validation
135
+ args:
136
+ num_few_shot: 5
137
+ metrics:
138
+ - type: acc
139
+ value: 54.3
140
+ name: accuracy
141
+ source:
142
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Llama-68M-Chat-v1
143
+ name: Open LLM Leaderboard
144
+ - task:
145
+ type: text-generation
146
+ name: Text Generation
147
+ dataset:
148
+ name: GSM8k (5-shot)
149
+ type: gsm8k
150
+ config: main
151
+ split: test
152
+ args:
153
+ num_few_shot: 5
154
+ metrics:
155
+ - type: acc
156
+ value: 0.0
157
+ name: accuracy
158
+ source:
159
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=Felladrin/Llama-68M-Chat-v1
160
+ name: Open LLM Leaderboard
161
  ---
162
 
163
  # A Llama Chat Model of 68M Parameters
 
193
  penalty_alpha: 0.5
194
  top_k: 4
195
  ```
196
+
197
+ ## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
198
+
199
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Felladrin__Llama-68M-Chat-v1)
200
+
201
+ | Metric |Value|
202
+ |---------------------------------|----:|
203
+ |Avg. |29.72|
204
+ |AI2 Reasoning Challenge (25-Shot)|23.29|
205
+ |HellaSwag (10-Shot) |28.27|
206
+ |MMLU (5-Shot) |25.18|
207
+ |TruthfulQA (0-shot) |47.27|
208
+ |Winogrande (5-shot) |54.30|
209
+ |GSM8k (5-shot) | 0.00|