Update README.md
Browse files
README.md
CHANGED
@@ -9,7 +9,7 @@ A frankenMoE of [heegyu/WizardVicuna-Uncensored-3B-0719](https://huggingface.co/
|
|
9 |
|
10 |
Unlike the last model, this is just the same model being used 16 times as experts. I felt like this would allow it to be more coherent, which was correct.
|
11 |
|
12 |
-
# [What is a Mixture of Experts (MoE)?](https://huggingface.co/blog/moe)
|
13 |
### (from the MistralAI papers...click the quoted question above to navigate to it directly.)
|
14 |
|
15 |
The scale of a model is one of the most important axes for better model quality. Given a fixed computing budget, training a larger model for fewer steps is better than training a smaller model for more steps.
|
|
|
9 |
|
10 |
Unlike the last model, this is just the same model being used 16 times as experts. I felt like this would allow it to be more coherent, which was correct.
|
11 |
|
12 |
+
# "[What is a Mixture of Experts (MoE)?](https://huggingface.co/blog/moe)"
|
13 |
### (from the MistralAI papers...click the quoted question above to navigate to it directly.)
|
14 |
|
15 |
The scale of a model is one of the most important axes for better model quality. Given a fixed computing budget, training a larger model for fewer steps is better than training a smaller model for more steps.
|