Update README.md
Browse files
README.md
CHANGED
@@ -6,7 +6,7 @@ tags:
|
|
6 |
- merge
|
7 |
---
|
8 |
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6589d7e6586088fd2784a12c/JGAuV97AOR2bCKbR2iNPc.jpeg)
|
9 |
-
#
|
10 |
|
11 |
A merge of [fblgit/UNA-TheBeagle-7b-v1](https://huggingface.co/fblgit/UNA-TheBeagle-7b-v1), [Open-Orca/Mistral-7B-OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca), [samir-fama/FernandoGPT-v1](https://huggingface.co/samir-fama/FernandoGPT-v1) and [Neuronovo/neuronovo-7B-v0.3](https://huggingface.co/Neuronovo/neuronovo-7B-v0.3).
|
12 |
|
@@ -43,4 +43,4 @@ If all our tokens are sent to just a few popular experts, that will make trainin
|
|
43 |
|
44 |
|
45 |
## "Wait...but you called this a frankenMoE?"
|
46 |
-
The difference between MoE and "frankenMoE" lies in the fact that the router layer in a model like the one on this repo is not trained simultaneously. There are rumors about someone developing a way for us to unscuff these frankenMoE models by training the router layer simultaneously.
|
|
|
6 |
- merge
|
7 |
---
|
8 |
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6589d7e6586088fd2784a12c/JGAuV97AOR2bCKbR2iNPc.jpeg)
|
9 |
+
# Intuition sharp as a blade
|
10 |
|
11 |
A merge of [fblgit/UNA-TheBeagle-7b-v1](https://huggingface.co/fblgit/UNA-TheBeagle-7b-v1), [Open-Orca/Mistral-7B-OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca), [samir-fama/FernandoGPT-v1](https://huggingface.co/samir-fama/FernandoGPT-v1) and [Neuronovo/neuronovo-7B-v0.3](https://huggingface.co/Neuronovo/neuronovo-7B-v0.3).
|
12 |
|
|
|
43 |
|
44 |
|
45 |
## "Wait...but you called this a frankenMoE?"
|
46 |
+
The difference between MoE and "frankenMoE" lies in the fact that the router layer in a model like the one on this repo is not trained simultaneously. There are rumors about someone developing a way for us to unscuff these frankenMoE models by training the router layer simultaneously. This model seems to overcome that.
|