RASMUS commited on
Commit
ec6248e
·
verified ·
1 Parent(s): e7fb7a8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -160,7 +160,7 @@ vocabulary size is 64k tokens. Inputs are sequences of 2048 consecutive tokens.
160
 
161
  ### Supervised fine-tuning (SFT)
162
 
163
- This model was first supervised fine-tuned (SFT) using the [unsloth](https://github.com/unslothai/unsloth) framework with a single NVIDIA GeForce RTX 4080 GPU. The model was fine-tuned for 1 epoch with a learning rate of 5e-05, weight decay of 5e-03, learning rate warmup ratio of 0.1 with cosine decay, batch size of 4 and gradient accumulation of 8 totalling the batch size to 32, max sequence lenght of 2048, and with NEFTune noise alpha of 5. The used optimizer was "paged_adamw_8bit" and the model was loaded with 4bit quantization. Training was done using the Rank-Stabilized LoRA (RSLora) with a rank of 256 and alpha of 128, LoRA dropout of 0.02, and target modules of "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj".
164
 
165
  ### Direct Preference Optimization (DPO) fine-tuning
166
 
 
160
 
161
  ### Supervised fine-tuning (SFT)
162
 
163
+ This model was first supervised fine-tuned (SFT) using the [unsloth](https://github.com/unslothai/unsloth) framework with a single NVIDIA GeForce RTX 4080 GPU. The model was fine-tuned for 1 epoch with a learning rate of 5e-05, weight decay of 5e-03, learning rate warmup ratio of 0.1 with cosine decay, batch size of 4 and gradient accumulation of 8 totalling the batch size to 32, max sequence lenght of 2048, and with NEFTune noise alpha of 5. The used optimizer was "paged_adamw_8bit" and the model was loaded with 4bit quantization. Training was done using the Rank-Stabilized LoRA (RSLora) with a rank of 256 and alpha of 128, LoRA dropout of 0.02, target modules of "q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj" and modules_to_save "lm_head", "embed_tokens".
164
 
165
  ### Direct Preference Optimization (DPO) fine-tuning
166