sethuiyer commited on
Commit
411b9a3
·
verified ·
1 Parent(s): 5dde2c4

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co./spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co./spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +109 -0
README.md CHANGED
@@ -1,6 +1,101 @@
1
  ---
2
  base_model:
3
  - unsloth/Meta-Llama-3.1-8B
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ---
5
  # Llama 3.1 8B Experimental 1206
6
 
@@ -56,3 +151,17 @@ Research is ongoing to address the limitations of large language models. Efforts
56
  ### **Conclusion**
57
 
58
  Large language models represent a significant advancement in the field of artificial intelligence, demonstrating remarkable abilities to process and generate human language. Their versatility and power have opened up numerous applications across industries, from healthcare and education to entertainment and customer service. However, realizing their full potential requires addressing the ethical, technical, and societal challenges they present. As research and development continue, large language models are poised to become even more integral to the way we interact with technology and each other.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  base_model:
3
  - unsloth/Meta-Llama-3.1-8B
4
+ model-index:
5
+ - name: Llama-3.1-8B-Experimental-1206-Instruct
6
+ results:
7
+ - task:
8
+ type: text-generation
9
+ name: Text Generation
10
+ dataset:
11
+ name: IFEval (0-Shot)
12
+ type: HuggingFaceH4/ifeval
13
+ args:
14
+ num_few_shot: 0
15
+ metrics:
16
+ - type: inst_level_strict_acc and prompt_level_strict_acc
17
+ value: 69.67
18
+ name: strict accuracy
19
+ source:
20
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Llama-3.1-8B-Experimental-1206-Instruct
21
+ name: Open LLM Leaderboard
22
+ - task:
23
+ type: text-generation
24
+ name: Text Generation
25
+ dataset:
26
+ name: BBH (3-Shot)
27
+ type: BBH
28
+ args:
29
+ num_few_shot: 3
30
+ metrics:
31
+ - type: acc_norm
32
+ value: 30.06
33
+ name: normalized accuracy
34
+ source:
35
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Llama-3.1-8B-Experimental-1206-Instruct
36
+ name: Open LLM Leaderboard
37
+ - task:
38
+ type: text-generation
39
+ name: Text Generation
40
+ dataset:
41
+ name: MATH Lvl 5 (4-Shot)
42
+ type: hendrycks/competition_math
43
+ args:
44
+ num_few_shot: 4
45
+ metrics:
46
+ - type: exact_match
47
+ value: 11.1
48
+ name: exact match
49
+ source:
50
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Llama-3.1-8B-Experimental-1206-Instruct
51
+ name: Open LLM Leaderboard
52
+ - task:
53
+ type: text-generation
54
+ name: Text Generation
55
+ dataset:
56
+ name: GPQA (0-shot)
57
+ type: Idavidrein/gpqa
58
+ args:
59
+ num_few_shot: 0
60
+ metrics:
61
+ - type: acc_norm
62
+ value: 6.6
63
+ name: acc_norm
64
+ source:
65
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Llama-3.1-8B-Experimental-1206-Instruct
66
+ name: Open LLM Leaderboard
67
+ - task:
68
+ type: text-generation
69
+ name: Text Generation
70
+ dataset:
71
+ name: MuSR (0-shot)
72
+ type: TAUR-Lab/MuSR
73
+ args:
74
+ num_few_shot: 0
75
+ metrics:
76
+ - type: acc_norm
77
+ value: 8.5
78
+ name: acc_norm
79
+ source:
80
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Llama-3.1-8B-Experimental-1206-Instruct
81
+ name: Open LLM Leaderboard
82
+ - task:
83
+ type: text-generation
84
+ name: Text Generation
85
+ dataset:
86
+ name: MMLU-PRO (5-shot)
87
+ type: TIGER-Lab/MMLU-Pro
88
+ config: main
89
+ split: test
90
+ args:
91
+ num_few_shot: 5
92
+ metrics:
93
+ - type: acc
94
+ value: 28.1
95
+ name: accuracy
96
+ source:
97
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sethuiyer/Llama-3.1-8B-Experimental-1206-Instruct
98
+ name: Open LLM Leaderboard
99
  ---
100
  # Llama 3.1 8B Experimental 1206
101
 
 
151
  ### **Conclusion**
152
 
153
  Large language models represent a significant advancement in the field of artificial intelligence, demonstrating remarkable abilities to process and generate human language. Their versatility and power have opened up numerous applications across industries, from healthcare and education to entertainment and customer service. However, realizing their full potential requires addressing the ethical, technical, and societal challenges they present. As research and development continue, large language models are poised to become even more integral to the way we interact with technology and each other.
154
+
155
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
156
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/sethuiyer__Llama-3.1-8B-Experimental-1206-Instruct-details)
157
+
158
+ | Metric |Value|
159
+ |-------------------|----:|
160
+ |Avg. |25.67|
161
+ |IFEval (0-Shot) |69.67|
162
+ |BBH (3-Shot) |30.06|
163
+ |MATH Lvl 5 (4-Shot)|11.10|
164
+ |GPQA (0-shot) | 6.60|
165
+ |MuSR (0-shot) | 8.50|
166
+ |MMLU-PRO (5-shot) |28.10|
167
+