leaderboard-pr-bot commited on
Commit
1938fec
1 Parent(s): 0e0fc11

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co./spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co./spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +111 -1
README.md CHANGED
@@ -1 +1,111 @@
1
- This is a model released from the preprint: [SimPO: Simple Preference Optimization with a Reference-Free Reward](https://arxiv.org/abs/2405.14734). Please refer to our [repository](https://github.com/princeton-nlp/SimPO) for more details.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ model-index:
3
+ - name: Llama-3-Instruct-8B-DPO-v0.2
4
+ results:
5
+ - task:
6
+ type: text-generation
7
+ name: Text Generation
8
+ dataset:
9
+ name: IFEval (0-Shot)
10
+ type: HuggingFaceH4/ifeval
11
+ args:
12
+ num_few_shot: 0
13
+ metrics:
14
+ - type: inst_level_strict_acc and prompt_level_strict_acc
15
+ value: 72.08
16
+ name: strict accuracy
17
+ source:
18
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=princeton-nlp/Llama-3-Instruct-8B-DPO-v0.2
19
+ name: Open LLM Leaderboard
20
+ - task:
21
+ type: text-generation
22
+ name: Text Generation
23
+ dataset:
24
+ name: BBH (3-Shot)
25
+ type: BBH
26
+ args:
27
+ num_few_shot: 3
28
+ metrics:
29
+ - type: acc_norm
30
+ value: 28.94
31
+ name: normalized accuracy
32
+ source:
33
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=princeton-nlp/Llama-3-Instruct-8B-DPO-v0.2
34
+ name: Open LLM Leaderboard
35
+ - task:
36
+ type: text-generation
37
+ name: Text Generation
38
+ dataset:
39
+ name: MATH Lvl 5 (4-Shot)
40
+ type: hendrycks/competition_math
41
+ args:
42
+ num_few_shot: 4
43
+ metrics:
44
+ - type: exact_match
45
+ value: 5.51
46
+ name: exact match
47
+ source:
48
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=princeton-nlp/Llama-3-Instruct-8B-DPO-v0.2
49
+ name: Open LLM Leaderboard
50
+ - task:
51
+ type: text-generation
52
+ name: Text Generation
53
+ dataset:
54
+ name: GPQA (0-shot)
55
+ type: Idavidrein/gpqa
56
+ args:
57
+ num_few_shot: 0
58
+ metrics:
59
+ - type: acc_norm
60
+ value: 4.92
61
+ name: acc_norm
62
+ source:
63
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=princeton-nlp/Llama-3-Instruct-8B-DPO-v0.2
64
+ name: Open LLM Leaderboard
65
+ - task:
66
+ type: text-generation
67
+ name: Text Generation
68
+ dataset:
69
+ name: MuSR (0-shot)
70
+ type: TAUR-Lab/MuSR
71
+ args:
72
+ num_few_shot: 0
73
+ metrics:
74
+ - type: acc_norm
75
+ value: 5.56
76
+ name: acc_norm
77
+ source:
78
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=princeton-nlp/Llama-3-Instruct-8B-DPO-v0.2
79
+ name: Open LLM Leaderboard
80
+ - task:
81
+ type: text-generation
82
+ name: Text Generation
83
+ dataset:
84
+ name: MMLU-PRO (5-shot)
85
+ type: TIGER-Lab/MMLU-Pro
86
+ config: main
87
+ split: test
88
+ args:
89
+ num_few_shot: 5
90
+ metrics:
91
+ - type: acc
92
+ value: 30.77
93
+ name: accuracy
94
+ source:
95
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=princeton-nlp/Llama-3-Instruct-8B-DPO-v0.2
96
+ name: Open LLM Leaderboard
97
+ ---
98
+ This is a model released from the preprint: [SimPO: Simple Preference Optimization with a Reference-Free Reward](https://arxiv.org/abs/2405.14734). Please refer to our [repository](https://github.com/princeton-nlp/SimPO) for more details.
99
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
100
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_princeton-nlp__Llama-3-Instruct-8B-DPO-v0.2)
101
+
102
+ | Metric |Value|
103
+ |-------------------|----:|
104
+ |Avg. |24.63|
105
+ |IFEval (0-Shot) |72.08|
106
+ |BBH (3-Shot) |28.94|
107
+ |MATH Lvl 5 (4-Shot)| 5.51|
108
+ |GPQA (0-shot) | 4.92|
109
+ |MuSR (0-shot) | 5.56|
110
+ |MMLU-PRO (5-shot) |30.77|
111
+