yixinsong commited on
Commit
d29e0f4
·
verified ·
1 Parent(s): ccbda51

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -16
README.md CHANGED
@@ -35,30 +35,64 @@ SmallThinker is designed for the following use cases:
35
 
36
  The model was trained using 8 H100 GPUs with a global batch size of 16. The specific configuration is as follows:
37
 
38
- ```
39
- neat_packing: true
40
- cutoff_len: 16384
41
- per_device_train_batch_size: 2
42
- gradient_accumulation_steps: 1
43
- learning_rate: 1.0e-5
44
- num_train_epochs: 3
45
- lr_scheduler_type: cosine
46
- warmup_ratio: 0.02
47
- bf16: true
48
- ddp_timeout: 180000000
49
- weight_decay: 0.0
50
- ```
51
-
52
  The SFT (Supervised Fine-Tuning) process was conducted in two phases:
53
 
54
  1. First Phase:
55
  - Used only the PowerInfer/QWQ-LONGCOT-500K dataset
56
  - Trained for 1.5 epochs
57
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
  2. Second Phase:
59
  - Combined training with PowerInfer/QWQ-LONGCOT-500K and PowerInfer/LONGCOT-Refine datasets
60
  - Continued training for 2 additional epochs
61
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
 
63
  ## Limitations & Disclaimer
64
 
 
35
 
36
  The model was trained using 8 H100 GPUs with a global batch size of 16. The specific configuration is as follows:
37
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  The SFT (Supervised Fine-Tuning) process was conducted in two phases:
39
 
40
  1. First Phase:
41
  - Used only the PowerInfer/QWQ-LONGCOT-500K dataset
42
  - Trained for 1.5 epochs
43
+ ```
44
+ ### model
45
+ model_name_or_path: saves/qwen2-01-qat/full/sft/checkpoint-24000
46
+
47
+ ### method
48
+ stage: sft
49
+ do_train: true
50
+ finetuning_type: full
51
+ deepspeed: examples/deepspeed/ds_z3_config.json
52
+
53
+ ### dataset
54
+ dataset: o1-v2
55
+ template: qwen
56
+ neat_packing: true
57
+ cutoff_len: 16384
58
+ overwrite_cache: true
59
+ preprocessing_num_workers: 16
60
+
61
+ ### output
62
+ output_dir: saves/qwen2-01-qat/full/sft
63
+ logging_steps: 1
64
+ save_steps: 1000
65
+ plot_loss: true
66
+ overwrite_output_dir: true
67
+ ```
68
  2. Second Phase:
69
  - Combined training with PowerInfer/QWQ-LONGCOT-500K and PowerInfer/LONGCOT-Refine datasets
70
  - Continued training for 2 additional epochs
71
+ ```
72
+ ### model
73
+ model_name_or_path: /home/syx/Qwen2.5-3B-Instruct
74
+
75
+ ### method
76
+ stage: sft
77
+ do_train: true
78
+ finetuning_type: full
79
+ deepspeed: examples/deepspeed/ds_z3_config.json
80
+
81
+ ### dataset
82
+ dataset: o1-v2, o1-v3
83
+ template: qwen
84
+ neat_packing: true
85
+ cutoff_len: 16384
86
+ overwrite_cache: true
87
+ preprocessing_num_workers: 16
88
+
89
+ ### output
90
+ output_dir: saves/qwen2-01-qat/full/sft
91
+ logging_steps: 1
92
+ save_steps: 1000
93
+ plot_loss: true
94
+ overwrite_output_dir: true
95
+ ```
96
 
97
  ## Limitations & Disclaimer
98