jnorthrup commited on
Commit
ad1a52e
·
verified ·
1 Parent(s): cba4908

Upload 9 files

Browse files
README.md CHANGED
@@ -1,5 +1,269 @@
1
  ---
2
- license: bsd
3
- base_model:
4
- - Skywork/Skywork-o1-Open-PRM-Qwen-2.5-7B
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: other
3
+ base_model: Qwen/Qwen2.5-Math-7B-Instruct
4
+ pipeline_tag: text-classification
5
+ ---
6
+
7
+ <div align="center">
8
+ <img src="misc/misc_fig.jpg" width="400"/>
9
+ <br>
10
+ 🤗 <a href="https://huggingface.co/Skywork" target="_blank">Hugging Face</a> • 🤖 <a href="https://modelscope.cn/organization/Skywork" target="_blank">ModelScope</a>
11
+ <br>
12
+ <br>
13
+ <br>
14
+ </div>
15
+
16
+ # Introduction
17
+
18
+ We are excited to announce the release of the Skywork o1 Open model series, developed by the Skywork team at Kunlun Inc. This groundbreaking release introduces a series of models that incorporate o1-like slow thinking and reasoning capabilities. The Skywork o1 Open model series includes three advanced models:
19
+ - **[Skywork o1 Open-Llama-3.1-8B](https://huggingface.co/Skywork/Skywork-o1-Open-Llama3.1-8B)**: A robust chat model trained on Llama-3.1-8B, enhanced significantly with "o1-style" data to improve reasoning skills.
20
+
21
+ - **[Skywork o1 Open-PRM-Qwen-2.5-1.5B](https://huggingface.co/Skywork/Skywork-o1-Open-PRM-Qwen2.5-1.5B)**: A specialized model designed to enhance reasoning capability through incremental process rewards, ideal for complex problem solving at a smaller scale.
22
+
23
+ - **[Skywork o1 Open-PRM-Qwen-2.5-7B](https://huggingface.co/Skywork/Skywork-o1-Open-PRM-Qwen2.5-7B)**: Extends the capabilities of the 1.5B model by scaling up to handle more demanding reasoning tasks, pushing the boundaries of AI reasoning.
24
+
25
+ Different from mere reproductions of the OpenAI o1 model, the Skywork o1 Open model series not only exhibits innate thinking, planning, and reflecting capabilities in its outputs, but also shows significant improvements in reasoning skills on standard benchmarks. This series represents a strategic advancement in AI capabilities, moving a previously weaker base model towards the state-of-the-art (SOTA) in reasoning tasks.
26
+
27
+ If you are interested in the Skywork o1 Open model series, please check out the [o1-llama-3.1-8b](https://huggingface.co/Skywork/o1-llama-3.1-8b) model.
28
+
29
+
30
+
31
+ # Model Information
32
+ The Skywork-o1-Open-PRM series are trained on [**Qwen2.5-Math-1.5B-Instruct**](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B-Instruct) and [**Qwen2.5-Math-7B-Instruct**](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct).
33
+
34
+
35
+ # PRM Evaluation
36
+
37
+ ## Evaluation Settings
38
+
39
+ ### Mathematical Evaluation
40
+ We utilized the evaluation scripts from [Qwen2.5-Math](https://github.com/QwenLM/Qwen2.5-Math) and followed their configuration to ensure consistency. The selected datasets include **GSM8K**, **MATH**, **GaoKao**, **CN-Middle School 24**, **OlympiadBench**, **AMC-23**, and **AIME-24**. Among these, **GaoKao** and **CN-Middle School 24** are Chinese datasets, while the remaining datasets are in English. Notably, **OlympiadBench**, **AIME-24**, and **AMC-23** are competition-level datasets.
41
+
42
+ ### Code Evaluation
43
+ For code evaluation, we adopted the evaluation scripts from [Qwen2.5-Coder](https://github.com/QwenLM/Qwen2.5-Coder) while largely maintaining the same configuration. The selected datasets include **HumanEval**, **MBPP**, and **LiveCodeBench**, with **LiveCodeBench** specifically using the version **2024.01-2024-11**. We use the latest version (0.3.1) of [evalplus](https://github.com/evalplus/evalplus) due to issues with tests and code sanitization in previous versions.
44
+
45
+
46
+ ## Evaluation Base Models
47
+
48
+ We evaluated the performance of RMs on three base models: **Qwen2.5-7B-Instruct**, **Llama3.1-8B-Instruct**, and **Skywork-o1-Open-8B**. Data sampling was conducted to verify the performance of the RMs across different models. The sampling temperature was set to **0.7** for mathematical problems and **1.0** for code-related tasks.
49
+
50
+
51
+ ## Compared RMs
52
+
53
+ - [Qwen2.5-Math-RM-72B](https://huggingface.co/Qwen/Qwen2.5-Math-RM-72B): An open-source ORM provided by the Qwen team.
54
+ - [OpenR-MATH-psa-PRM-7B](https://huggingface.co/openreasoner/Math-psa): An open-source PRM from the OpenR project.
55
+ - [RLHFlow-Deepseek-Data-PRM-8B](https://huggingface.co/RLHFlow/Llama3.1-8B-PRM-Deepseek-Data): An open-source PRM from the RLHFlow project.
56
+
57
+
58
+ ## Evaluation Metrics
59
+
60
+ - **Greedy Sampling Pass@1**: Uses greedy sampling for generating answers.
61
+ - **Majority Voting@64**: Randomly samples 64 answers.
62
+ - **Best-of-N@64**: Ranks 64 answers based on output values provided by the Reward Model (RM). The weighting methods differ for ORM and PRM:
63
+ - For **ORM**, only the reward from the final step is used.
64
+ - For **PRM**, the average reward across all steps is used for weighting.
65
+
66
+
67
+ ## Evaluation Results
68
+
69
+ ### Mathematical Evaluation
70
+
71
+ #### Skywork-o1-Open-8B
72
+ | Reward Model | Method | GSM8K | MATH | GaoKao | CN-Middle School 24 | OlympiadBench | AIME-24 | AMC-23 | Avg |
73
+ |--------------------------------|-------------------------|--------|--------|--------|---------------------|---------------|---------|--------|------|
74
+ | N/A | Greedy Sampling Pass@1 | 91.6 | 78.1 | 63.6 | 67.3 | 43.1 | 13.3 | 55.0 | 58.9 |
75
+ | N/A | Majority Voting@64 | 93.9 | 84.3 | 69.5 | 73.3 | 50.4 | 16.7 | 52.5 | 62.9 |
76
+ | OpenR-MATH-psa-PRM-**7B** | Best-of-N@64 | 95.1 | 82.7 | 67.1 | 70.3 | 47.6 | 20.0 | 57.5 | 62.9 |
77
+ | RLHFlow-Deepseek-Data-PRM-**8B** | Best-of-N@64 | 94.4 | 80.1 | 59.1 | 74.3 | 43.0 | 20.0 | 50.0 | 60.1 |
78
+ | Qwen2.5-Math-RM-**72B** | Best-of-N@64 | 96.1 | 86.9 | **76.0** | 76.2 | **53.3** | **26.7** | **65.0** | **68.6** |
79
+ | Skywork-o1-Open-PRM-**1.5B** | Best-of-N@64 | 94.5 | 85.0 | 65.6 | 73.3 | 49.9 | 16.7 | 62.5 | 63.9 |
80
+ | Skywork-o1-Open-PRM-**7B** | Best-of-N@64 | **96.7** | **87.0** | 70.3 | **76.2** | 52.3 | 23.3 | **65.0** | 67.3 |
81
+
82
+
83
+ #### Qwen2.5-7B-Instruct
84
+ | Reward Model | Method | GSM8K | MATH | GaoKao | CN-Middle School 24 | OlympiadBench | AIME-24 | AMC-23 | Avg |
85
+ |--------------------------------|-------------------------|--------|--------|--------|---------------------|---------------|---------|--------|------|
86
+ | N/A | Greedy Sampling Pass@1 | 91.9 | 75.2 | 55.6 | 75.2 | 39.1 | 13.3 | 45.0 | 56.5 |
87
+ | N/A | Majority Voting@64 | 93.5 | 78.4 | 55.3 | 78.2 | 40.1 | 13.3 | 62.5 | 60.2 |
88
+ | OpenR-MATH-psa-PRM-**7B** | Best-of-N@64 | 93.9 | 77.9 | 52.4 | 73.3 | 40.7 | 10.0 | 55.0 | 57.6 |
89
+ | RLHFlow-Deepseek-Data-PRM-**8B** | Best-of-N@64 | 94.1 | 78.1 | 53.2 | 75.2 | 39.1 | 16.7 | 55.0 | 58.8 |
90
+ | Qwen2.5-Math-RM-**72B** | Best-of-N@64 | 94.8 | **82.4** | **65.2** | **80.2** | **45.0** | **13.3** | 62.5 | 63.4 |
91
+ | Skywork-o1-Open-PRM-**1.5B** | Best-of-N@64 | 93.3 | 79.8 | 56.1 | 74.3 | 43.9 | 10.0 | 67.5 | 60.7 |
92
+ | Skywork-o1-Open-PRM-**7B** | Best-of-N@64 | **94.9** | 81.9 | 56.3 | 75.2 | 44.9 | **13.3** | **65.0** | 61.6 |
93
+
94
+
95
+
96
+ #### Llama3.1-8B-Instruct
97
+ | Reward Model | Method | GSM8K | MATH | GaoKao | CN-Middle School 24 | OlympiadBench | AIME-24 | AMC-23 | Avg |
98
+ |--------------------------------|-------------------------|--------|--------|--------|---------------------|---------------|---------|--------|------|
99
+ | N/A | Greedy Sampling Pass@1 | 85.3 | 49.7 | 25.3 | 47.5 | 16.6 | 6.7 | 27.5 | 36.9 |
100
+ | N/A | Majority Voting@64 | 90.9 | 62.9 | 28.0 | 56.4 | 26.4 | 13.3 | 37.5 | 45.1 |
101
+ | OpenR-MATH-psa-PRM-**7B** | Best-of-N@64 | 91.8 | 59.4 | 24.7 | 47.5 | 23.0 | 13.3 | 35.0 | 42.1 |
102
+ | RLHFlow-Deepseek-Data-PRM-**8B** | Best-of-N@64 | 89.8 | 56.1 | 24.0 | 40.6 | 20.4 | 0.0 | 35.0 | 38.0 |
103
+ | Qwen2.5-Math-RM-**72B** | Best-of-N@64 | **94.9** | **72.5** | **44.9** | **65.3** | **34.4** | **23.3** | **60.0** | 56.5 |
104
+ | Skywork-o1-Open-PRM-**1.5B** | Best-of-N@64 | 91.7 | 65.6 | 26.8 | 49.5 | 27.0 | 16.7 | **60.0** | 48.2 |
105
+ | Skywork-o1-Open-PRM-**7B** | Best-of-N@64 | 94.0 | 69.8 | 32.0 | 56.4 | 29.9 | 16.7 | 52.5 | 50.2 |
106
+
107
+
108
+ ### Code Evaluation
109
+ Since the compared PRMs have not been trained on code-related tasks, this section focuses solely on the performance of Skywork-o1-Open-PRM.
110
+
111
+ #### Skywork-o1-Open-8B
112
+
113
+ | Reward Model | Method | MBPP | MBPP+ | HumanEval | HumanEval+ | LiveCodeBench-2024.01-2024-11 |
114
+ |--------------------------|-------------------------|-------|-------|-----------|------------|-------------------------------|
115
+ | N/A | Greedy Sampling Pass@1 | 79.9 | 65.9 | **82.9** | **78.7** | 26.0 |
116
+ | Skywork-o1-Open-PRM-7B | Best-of-N@64 | **81.2** | **68.5** | 81.1 | 74.4 | **31.3** |
117
+
118
+
119
+ #### Qwen2.5-7B-Instruct
120
+
121
+ | Reward Model | Method | MBPP | MBPP+ | HumanEval | HumanEval+ | LiveCodeBench-2024.01-2024-11 |
122
+ |--------------------------|-------------------------|-------|-------|-----------|------------|-------------------------------|
123
+ | N/A | Greedy Sampling Pass@1 | 81.7 | 69.3 | **84.8** | **78.0** | 25.3 |
124
+ | Skywork-o1-Open-PRM-7B | Best-of-N@64 | **84.9** | **72.5** | 83.5 | **78.0** | **30.7** |
125
+
126
+ #### Llama3.1-8B-Instruct
127
+
128
+ | Reward Model | Method | MBPP | MBPP+ | HumanEval | HumanEval+ | LiveCodeBench-2024.01-2024-11 |
129
+ |--------------------------|-------------------------|-------|-------|-----------|------------|-------------------------------|
130
+ | N/A | Greedy Sampling Pass@1 | 75.7 | 64.0 | 68.3 | 63.4 | 16.1 |
131
+ | Skywork-o1-Open-PRM-7B | Best-of-N@64 | **77.8** | **66.1** | **73.8** | **67.7** | **23.3** |
132
+
133
+
134
+
135
+ # Quick Start
136
+ For detailed instructions on using PRM, please refer to the [Skywork-o1-PRM-Inference Repository](https://github.com/SkyworkAI/skywork-o1-prm-inference).
137
+
138
+
139
+
140
+ ## huggingface inference
141
+
142
+ 1. clone skywork prm inference repo
143
+ ```shell
144
+ git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
145
+ cd skywork-o1-prm-inference
146
+ ```
147
+ 1. run_prm_inference
148
+
149
+ ```python
150
+ from transformers import AutoTokenizer
151
+ from model_utils.prm_model import PRM_MODEL
152
+ from model_utils.io_utils import prepare_input, prepare_batch_input_for_model, derive_step_rewards
153
+
154
+ prm_model_path = "/path/to/prm_model"
155
+ tokenizer = AutoTokenizer.from_pretrained(prm_model_path, trust_remote_code=True)
156
+ datas = [
157
+ {
158
+ "problem" : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
159
+ "response" : "To determine how much money Janet makes every day at the farmers' market, we need to follow these steps:\n1. **Calculate the total number of eggs laid by the ducks per day.**\n Janet's ducks lay 16 eggs per day.\n2. **Determine the number of eggs Janet uses each day.**\n - She eats 3 eggs for breakfast every morning.\n - She bakes muffins for her friends every day with 4 eggs.\n So, the total number of eggs used per day is:\n \\[\n 3 + 4 = 7 \\text{ eggs}\n \\]\n3. **Calculate the number of eggs Janet sells at the farmers' market each day.**\n Subtract the number of eggs used from the total number of eggs laid:\n \\[\n 16 - 7 = 9 \\text{ eggs}\n \\]\n4. **Determine how much money Janet makes from selling the eggs.**\n She sells each egg for $2, so the total amount of money she makes is:\n \\[\n 9 \\times 2 = 18 \\text{ dollars}\n \\]\nTherefore, the amount of money Janet makes every day at the farmers' market is $\\boxed{18}$."
160
+ },
161
+ {
162
+ "problem" : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
163
+ "response" : "To find out how much Janet makes every day at the farmers' market, we need to follow these steps:\n1. Calculate the total number of eggs laid by the ducks: \n - Janet's ducks lay 16 eggs per day.\n2. Calculate the total number of eggs consumed by Janet and used for baking:\n - Janet eats 3 eggs for breakfast.\n - She bakes 4 eggs for muffins.\n - Total eggs used: 3 (breakfast) + 4 (baking) = 7 eggs.\n3. Calculate the remaining number of eggs for sale:\n - Total eggs laid: 16\n - Eggs used: 7\n - Remaining eggs: 16 - 7 = 9 eggs\n4. Calculate the total amount of money made at the farmers' market:\n - Price per egg: $2\n - Number of eggs sold: 9\n - Total money made: 9 * $2 = $18\nTherefore, Janet makes $\\boxed{18}$ dollars every day at the farmers' market."
164
+ }
165
+ ]
166
+
167
+
168
+ processed_data = [prepare_input(d["problem"], d["response"], tokenizer=tokenizer, step_token="\n") for d in datas]
169
+ input_ids, steps, reward_flags = zip(*processed_data)
170
+
171
+ model = PRM_MODEL.from_pretrained(prm_model_path, device_map="auto").eval()
172
+ input_ids, attention_mask, reward_flags = prepare_batch_input_for_model(input_ids, reward_flags, tokenizer.pad_token_id)
173
+ _, _, rewards = model(input_ids=input_ids, attention_mask=attention_mask, return_probs=True)
174
+ step_rewards = derive_step_rewards(rewards, reward_flags)
175
+ print("step_rewards:",step_rewards[0])
176
+ print("step_rewards:",step_rewards[1])
177
+ ```
178
+
179
+ ## vllm server for inference
180
+
181
+ 1. install vllm and install vllm prm plugin
182
+ ```shell
183
+ pip install vllm==v0.6.4.post1
184
+ git clone https://github.com/SkyworkAI/skywork-o1-prm-inference.git
185
+ cd skywork-o1-prm-inference
186
+ pip install -e .
187
+ ```
188
+
189
+ 2. start vllm server
190
+ ```shell
191
+ CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve /path/to/prm_model \
192
+ --host 0.0.0.0 \
193
+ --port 8081 \
194
+ --tensor-parallel-size 4 \
195
+ --gpu-memory-utilization 0.9 \
196
+ --enable-prefix-caching \
197
+ --dtype auto
198
+ ```
199
+
200
+ 3. request server for inference
201
+
202
+ ```python
203
+ from openai import OpenAI
204
+ from transformers import AutoTokenizer
205
+ from model_utils.io_utils import prepare_input, derive_step_rewards_vllm
206
+
207
+ prm_model_path = "/path/to/prm_model"
208
+ tokenizer = AutoTokenizer.from_pretrained(prm_model_path, trust_remote_code=True)
209
+ datas = [
210
+ {
211
+ "problem" : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
212
+ "response" : "To determine how much money Janet makes every day at the farmers' market, we need to follow these steps:\n1. **Calculate the total number of eggs laid by the ducks per day.**\n Janet's ducks lay 16 eggs per day.\n2. **Determine the number of eggs Janet uses each day.**\n - She eats 3 eggs for breakfast every morning.\n - She bakes muffins for her friends every day with 4 eggs.\n So, the total number of eggs used per day is:\n \\[\n 3 + 4 = 7 \\text{ eggs}\n \\]\n3. **Calculate the number of eggs Janet sells at the farmers' market each day.**\n Subtract the number of eggs used from the total number of eggs laid:\n \\[\n 16 - 7 = 9 \\text{ eggs}\n \\]\n4. **Determine how much money Janet makes from selling the eggs.**\n She sells each egg for $2, so the total amount of money she makes is:\n \\[\n 9 \\times 2 = 18 \\text{ dollars}\n \\]\nTherefore, the amount of money Janet makes every day at the farmers' market is $\\boxed{18}$."
213
+ },
214
+ {
215
+ "problem" : "Janet\u2019s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?",
216
+ "response" : "To find out how much Janet makes every day at the farmers' market, we need to follow these steps:\n1. Calculate the total number of eggs laid by the ducks: \n - Janet's ducks lay 16 eggs per day.\n2. Calculate the total number of eggs consumed by Janet and used for baking:\n - Janet eats 3 eggs for breakfast.\n - She bakes 4 eggs for muffins.\n - Total eggs used: 3 (breakfast) + 4 (baking) = 7 eggs.\n3. Calculate the remaining number of eggs for sale:\n - Total eggs laid: 16\n - Eggs used: 7\n - Remaining eggs: 16 - 7 = 9 eggs\n4. Calculate the total amount of money made at the farmers' market:\n - Price per egg: $2\n - Number of eggs sold: 9\n - Total money made: 9 * $2 = $18\nTherefore, Janet makes $\\boxed{18}$ dollars every day at the farmers' market."
217
+ }
218
+ ]
219
+
220
+ # data preprocessing
221
+ processed_data = [prepare_input(d["problem"], d["response"], tokenizer=tokenizer, step_token="\n") for d in datas]
222
+ input_ids, steps, reward_flags = zip(*processed_data)
223
+
224
+ openai_api_key = "EMPTY"
225
+ openai_api_base = "http://localhost:8081/v1"
226
+ client = OpenAI(
227
+ # defaults to os.environ.get("OPENAI_API_KEY")
228
+ api_key=openai_api_key,
229
+ base_url=openai_api_base,
230
+ )
231
+ models = client.models.list()
232
+ model = models.data[0].id
233
+ rewards = client.embeddings.create(
234
+ input=input_ids,
235
+ model=model,
236
+ )
237
+
238
+ step_rewards = derive_step_rewards_vllm(rewards, reward_flags)
239
+ print("step_rewards:",step_rewards[0])
240
+ print("step_rewards:",step_rewards[1])
241
+ ```
242
+
243
+ # TODO
244
+ - Add more results for step-wise RM evaluation methods.
245
+ - Adjust the RM architecture to enhance compatibility with vLLM/sglang inference.
246
+ - Expand RM use cases by incorporating more types of reasoning tasks.
247
+ - Mitigate performance conflicts across different reasoning tasks.
248
+
249
+
250
+ # LICENSE
251
+ The community usage of Skywork models require Skywork Community License. The Skywork models support commercial use. If you plan to use the Skywork models or its derivatives for commercial purposes, you must abide by terms and conditions within Skywork Community License.
252
+
253
+ # DISCLAIMER
254
+ We hereby declare that the Skywork models should not be used for any activities that pose a threat to national or societal security or engage in unlawful actions. Additionally, we request users not to deploy the Skywork models for internet services without appropriate security reviews and records. We hope that all users will adhere to this principle to ensure that technological advancements occur in a regulated and lawful environment.
255
+
256
+ We have done our utmost to ensure the compliance of the data used during the model's training process. However, despite our extensive efforts, due to the complexity of the model and data, there may still be unpredictable risks and issues. Therefore, if any problems arise as a result of using the Skywork open-source model, including but not limited to data security issues, public opinion risks, or any risks and problems arising from the model being misled, abused, disseminated, or improperly utilized, we will not assume any responsibility.
257
+
258
+ # Citation
259
+ If you find our work helpful, please feel free to cite us using the following BibTeX entry:
260
+ ```
261
+ @misc{skyworkopeno12024,
262
+ title={Skywork-o1 Open Series},
263
+ author={Skywork-o1 Team},
264
+ year={2024},
265
+ month={November},
266
+ howpublished={\url{https://huggingface.co/Skywork}},
267
+ url={https://huggingface.co/Skywork},
268
+ }
269
+ ```
added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "Qwen2ForPrmModel"
4
+ ],
5
+ "auto_map": {
6
+ "AutoConfig": "configuration_qwen2_rm.Qwen2RMConfig",
7
+ "AutoModel": "modeling_qwen2_rm.Qwen2ForRewardModel"
8
+ },
9
+ "attention_dropout": 0.0,
10
+ "bos_token_id": 151643,
11
+ "end_token_id": 151645,
12
+ "eos_token_id": 151645,
13
+ "hidden_act": "silu",
14
+ "hidden_size": 3584,
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 18944,
17
+ "max_position_embeddings": 4096,
18
+ "max_window_layers": 28,
19
+ "model_type": "qwen2",
20
+ "num_attention_heads": 28,
21
+ "num_hidden_layers": 28,
22
+ "num_key_value_heads": 4,
23
+ "pad_token_id": 151643,
24
+ "rms_norm_eps": 1e-06,
25
+ "rope_theta": 10000.0,
26
+ "sliding_window": null,
27
+ "tie_word_embeddings": false,
28
+ "torch_dtype": "bfloat16",
29
+ "transformers_version": "4.44.0.dev0",
30
+ "use_cache": true,
31
+ "use_sliding_window": false,
32
+ "vocab_size": 152064
33
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3b582d51d50998e45d408ac3c28c84ae1d12a8bf4bc482a4cd88b48067a3680d
3
+ size 15231279170
special_tokens_map.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "bos_token": "<|im_start|>",
18
+ "eos_token": {
19
+ "content": "<|im_end|>",
20
+ "lstrip": false,
21
+ "normalized": false,
22
+ "rstrip": false,
23
+ "single_word": false
24
+ },
25
+ "pad_token": {
26
+ "content": "<|endoftext|>",
27
+ "lstrip": false,
28
+ "normalized": false,
29
+ "rstrip": false,
30
+ "single_word": false
31
+ }
32
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,207 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": "<|im_start|>",
198
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'Please reason step by step, and put your final answer within \\\\boxed{}.' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- else %}\n {{- '<|im_start|>system\\nPlease reason step by step, and put your final answer within \\\\boxed{}.<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
199
+ "clean_up_tokenization_spaces": false,
200
+ "eos_token": "<|im_end|>",
201
+ "errors": "replace",
202
+ "model_max_length": 1000000000.0,
203
+ "pad_token": "<|endoftext|>",
204
+ "split_special_tokens": false,
205
+ "tokenizer_class": "Qwen2Tokenizer",
206
+ "unk_token": null
207
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff