ptrdvn commited on
Commit
adedb66
·
verified ·
1 Parent(s): 56633c4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -110,7 +110,7 @@ We compare this to the original R1 model and test in both regimes where repetiti
110
  Code for the SakanaAI/gsm8k-ja-test_250-1319 evaluation can be found [here](https://drive.google.com/file/d/1gCzCJv5vasw8R3KVQimfoIDFyfxwxNvC/view?usp=sharing).
111
 
112
 
113
- We further use the first 50 prompts from [DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja](https://huggingface.co/datasets/DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja) to evaluate the percentage of valid Japanese `\<think\>` sections in model responses.
114
  This benchmark contains more varied and complex prompts, meaning this is a more realistic evaluation of how reliably this model can output Japanese.
115
 
116
  | | Repetition Penalty | Valid Japanese `<think>` (%) |
@@ -132,8 +132,8 @@ We made the data for this model using the following steps:
132
  4. Generate answers to prompts using [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B).
133
  5. Filter out responses which did not:
134
  * Finish within 2048 tokens
135
- * Contain a valid `\<think\>` section
136
- * Have the `\<think\>` section written in Japanese
137
 
138
  We used this data to train our model using supervised fine tuning on [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) with the [ecs.gn8is-8x.32xlarge](https://www.alibabacloud.com/help/en/ecs/user-guide/gpu-accelerated-compute-optimized-and-vgpu-accelerated-instance-families-1) instance.
139
 
 
110
  Code for the SakanaAI/gsm8k-ja-test_250-1319 evaluation can be found [here](https://drive.google.com/file/d/1gCzCJv5vasw8R3KVQimfoIDFyfxwxNvC/view?usp=sharing).
111
 
112
 
113
+ We further use the first 50 prompts from [DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja](https://huggingface.co/datasets/DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja) to evaluate the percentage of valid Japanese `<think>` sections in model responses.
114
  This benchmark contains more varied and complex prompts, meaning this is a more realistic evaluation of how reliably this model can output Japanese.
115
 
116
  | | Repetition Penalty | Valid Japanese `<think>` (%) |
 
132
  4. Generate answers to prompts using [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B).
133
  5. Filter out responses which did not:
134
  * Finish within 2048 tokens
135
+ * Contain a valid `<think>` section
136
+ * Have the `<think>` section written in Japanese
137
 
138
  We used this data to train our model using supervised fine tuning on [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) with the [ecs.gn8is-8x.32xlarge](https://www.alibabacloud.com/help/en/ecs/user-guide/gpu-accelerated-compute-optimized-and-vgpu-accelerated-instance-families-1) instance.
139