something-else commited on
Commit
b085b1a
·
verified ·
1 Parent(s): c2ab722

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -1
README.md CHANGED
@@ -23,4 +23,13 @@ tags:
23
  - rwkv-v5-stp62-N8.pth : 3B rocm-rwkv model starting with the previous but now with 62 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.790 for N8 and 42.538 GTokens.
24
  - rwkv-v5-stp76-N8.pth : 3B rocm-rwkv model starting with the previous but now with 62 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.780 for N8 and 51.763 GTokens.
25
  - rwkv-v5-stp118-N8.pth : 3B rocm-rwkv model starting with the previous but now with 118 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.750 for N8 and 79.508 GTokens.
26
- - rwkv-v5-stp146-N8.pth : 3B rocm-rwkv model starting with the previous but now with 146 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.758 for N8 and 97.982 GTokens.
 
 
 
 
 
 
 
 
 
 
23
  - rwkv-v5-stp62-N8.pth : 3B rocm-rwkv model starting with the previous but now with 62 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.790 for N8 and 42.538 GTokens.
24
  - rwkv-v5-stp76-N8.pth : 3B rocm-rwkv model starting with the previous but now with 62 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.780 for N8 and 51.763 GTokens.
25
  - rwkv-v5-stp118-N8.pth : 3B rocm-rwkv model starting with the previous but now with 118 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.750 for N8 and 79.508 GTokens.
26
+ - rwkv-v5-stp146-N8.pth : 3B rocm-rwkv model starting with the previous but now with 146 epochs of N8 dataset with --lr_init 7e-6 --lr_final 7e-6. This pth has a loss of 1.758 for N8 and 97.982 GTokens.
27
+
28
+
29
+ 7B rocm-rwkv pth record: I called this model Tlanuwa since I added an extra training focusing on cherokee after each run.
30
+
31
+ 9B rocm-rwkv pth record: 40 layers embd=4096 ctx= 16384 I am calling this model Quetzal. I called this model Quetzal since I added an extra training focusing on Spanish and the dataset Axolotl-Spanish-Nahuatl after each run.
32
+ - rwkv-9Q-stp101-N8.pth: 9B rocm-rwkv model trained with Slim pajama chunk1-10 for the first epoch and an aditional training with chunk1-2 and a mix of multi-language and code after that I am using the N8 dataset. I am currendly with the N8 dataset 4.222 GTokes. This pth has a loss of 1.904.
33
+
34
+
35
+