Update README.md
Browse files
README.md
CHANGED
@@ -110,7 +110,7 @@ We compare this to the original R1 model and test in both regimes where repetiti
|
|
110 |
Code for the SakanaAI/gsm8k-ja-test_250-1319 evaluation can be found [here](https://drive.google.com/file/d/1gCzCJv5vasw8R3KVQimfoIDFyfxwxNvC/view?usp=sharing).
|
111 |
|
112 |
|
113 |
-
We further use the first 50 prompts from [DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja](https://huggingface.co/datasets/DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja) to evaluate the percentage of valid Japanese
|
114 |
This benchmark contains more varied and complex prompts, meaning this is a more realistic evaluation of how reliably this model can output Japanese.
|
115 |
|
116 |
| | Repetition Penalty | Valid Japanese `<think>` (%) |
|
@@ -132,8 +132,8 @@ We made the data for this model using the following steps:
|
|
132 |
4. Generate answers to prompts using [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B).
|
133 |
5. Filter out responses which did not:
|
134 |
* Finish within 2048 tokens
|
135 |
-
* Contain a valid
|
136 |
-
* Have the
|
137 |
|
138 |
We used this data to train our model using supervised fine tuning on [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) with the [ecs.gn8is-8x.32xlarge](https://www.alibabacloud.com/help/en/ecs/user-guide/gpu-accelerated-compute-optimized-and-vgpu-accelerated-instance-families-1) instance.
|
139 |
|
|
|
110 |
Code for the SakanaAI/gsm8k-ja-test_250-1319 evaluation can be found [here](https://drive.google.com/file/d/1gCzCJv5vasw8R3KVQimfoIDFyfxwxNvC/view?usp=sharing).
|
111 |
|
112 |
|
113 |
+
We further use the first 50 prompts from [DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja](https://huggingface.co/datasets/DeL-TaiseiOzaki/Tengentoppa-sft-reasoning-ja) to evaluate the percentage of valid Japanese `<think>` sections in model responses.
|
114 |
This benchmark contains more varied and complex prompts, meaning this is a more realistic evaluation of how reliably this model can output Japanese.
|
115 |
|
116 |
| | Repetition Penalty | Valid Japanese `<think>` (%) |
|
|
|
132 |
4. Generate answers to prompts using [deepseek-ai/DeepSeek-R1-Distill-Llama-70B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B).
|
133 |
5. Filter out responses which did not:
|
134 |
* Finish within 2048 tokens
|
135 |
+
* Contain a valid `<think>` section
|
136 |
+
* Have the `<think>` section written in Japanese
|
137 |
|
138 |
We used this data to train our model using supervised fine tuning on [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) with the [ecs.gn8is-8x.32xlarge](https://www.alibabacloud.com/help/en/ecs/user-guide/gpu-accelerated-compute-optimized-and-vgpu-accelerated-instance-families-1) instance.
|
139 |
|