Kquant03
/

Raiden-16x3.43B

Text Generation

Mixture of Experts

text-generation-inference

Model card Files Files and versions Community

Kquant03 commited on Jan 2, 2024

Commit

2c5d8d0

·

1 Parent(s): 3fda34f

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ A frankenMoE of [heegyu/WizardVicuna-Uncensored-3B-0719](https://huggingface.co/
 Unlike the last model, this is just the same model being used 16 times as experts. I felt like this would allow it to be more coherent, which was correct.
-# [What is a Mixture of Experts (MoE)?](https://huggingface.co/blog/moe)
 ### (from the MistralAI papers...click the quoted question above to navigate to it directly.)
 The scale of a model is one of the most important axes for better model quality. Given a fixed computing budget, training a larger model for fewer steps is better than training a smaller model for more steps.

 Unlike the last model, this is just the same model being used 16 times as experts. I felt like this would allow it to be more coherent, which was correct.
+# "[What is a Mixture of Experts (MoE)?](https://huggingface.co/blog/moe)"
 ### (from the MistralAI papers...click the quoted question above to navigate to it directly.)
 The scale of a model is one of the most important axes for better model quality. Given a fixed computing budget, training a larger model for fewer steps is better than training a smaller model for more steps.