yuchenFan commited on
Commit
e519e70
·
1 Parent(s): 6d9a234

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -155,7 +155,7 @@ We use Best-of-64 as our evaluation metric. The weighting methods are different
155
 
156
  | Method | Reward Model | MATH | AMC | AIME 2024 | OlympiadBench | Minerva Math | Avg |
157
  | --- | --- | --- | --- | --- | --- | --- | --- |
158
- | Greedy Pass @ 1 | N/A | 62.1 | 37.3 | 16.7 | 27.7 | 34.2 | 35.6 |
159
  | Majority Voting @ 64 | N/A | 80.2 | 53.0 | 26.7 | 40.4 | 38.6 | 47.8 |
160
  | Best-of-N @ 64 | Skywork-o1-Open-PRM-Qwen-2.5-7B | 77.8 | 56.6 | 23.3 | 39.0 | 31.6 | 45.7 |
161
  | | EurusPRM-Stage 1 | 77.8 | 44.6 | **26.7** | 35.3 | 41.5 | 45.2 |
 
155
 
156
  | Method | Reward Model | MATH | AMC | AIME 2024 | OlympiadBench | Minerva Math | Avg |
157
  | --- | --- | --- | --- | --- | --- | --- | --- |
158
+ | Greedy Pass @ 1 | N/A | 64.6 | 30.1 | 16.7 | 31.9 | 35.3 | 35.7 |
159
  | Majority Voting @ 64 | N/A | 80.2 | 53.0 | 26.7 | 40.4 | 38.6 | 47.8 |
160
  | Best-of-N @ 64 | Skywork-o1-Open-PRM-Qwen-2.5-7B | 77.8 | 56.6 | 23.3 | 39.0 | 31.6 | 45.7 |
161
  | | EurusPRM-Stage 1 | 77.8 | 44.6 | **26.7** | 35.3 | 41.5 | 45.2 |