Sao10K
/

Fimbulvetr-11B-v2.1-16K

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Sao10K commited on Jun 29

Commit

24fc6a3

•

1 Parent(s): 501a245

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -11,7 +11,7 @@ Trained with compute from [Backyard.ai](https://backyard.ai/) | Thanks to them a
 Fimbulvetr-v2 but extended to 16K with PoSE. A sane context value would be ~12K before it degrades.
 Note:
-<br> \- I left Rope Theta at 10K for this train, instead of expanding it like with Stheno 3.3. Solar did not play will with extended theta, grad norm / loss values went parabolic or plunged from 10000+ down. Unreliable pretty much, unlike Stheno 3.3's training run.
 ---

 Fimbulvetr-v2 but extended to 16K with PoSE. A sane context value would be ~12K before it degrades.
 Note:
+<br> \- I left Rope Theta at 10K for this train, instead of expanding it like with Stheno 3.3. Solar did not play well with extended theta, grad norm / loss values went parabolic or plunged from 10000+ down. Unreliable pretty much, unlike Stheno 3.3's training run.
 ---