Kquant03 commited on
Commit
26e93b9
·
verified ·
1 Parent(s): 1f6f14a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -6,7 +6,7 @@ tags:
6
  - merge
7
  ---
8
  ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6589d7e6586088fd2784a12c/JGAuV97AOR2bCKbR2iNPc.jpeg)
9
- # Borked MoE :(
10
 
11
  A merge of [fblgit/UNA-TheBeagle-7b-v1](https://huggingface.co/fblgit/UNA-TheBeagle-7b-v1), [Open-Orca/Mistral-7B-OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca), [samir-fama/FernandoGPT-v1](https://huggingface.co/samir-fama/FernandoGPT-v1) and [Neuronovo/neuronovo-7B-v0.3](https://huggingface.co/Neuronovo/neuronovo-7B-v0.3).
12
 
@@ -43,4 +43,4 @@ If all our tokens are sent to just a few popular experts, that will make trainin
43
 
44
 
45
  ## "Wait...but you called this a frankenMoE?"
46
- The difference between MoE and "frankenMoE" lies in the fact that the router layer in a model like the one on this repo is not trained simultaneously. There are rumors about someone developing a way for us to unscuff these frankenMoE models by training the router layer simultaneously.
 
6
  - merge
7
  ---
8
  ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6589d7e6586088fd2784a12c/JGAuV97AOR2bCKbR2iNPc.jpeg)
9
+ # Intuition sharp as a blade
10
 
11
  A merge of [fblgit/UNA-TheBeagle-7b-v1](https://huggingface.co/fblgit/UNA-TheBeagle-7b-v1), [Open-Orca/Mistral-7B-OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca), [samir-fama/FernandoGPT-v1](https://huggingface.co/samir-fama/FernandoGPT-v1) and [Neuronovo/neuronovo-7B-v0.3](https://huggingface.co/Neuronovo/neuronovo-7B-v0.3).
12
 
 
43
 
44
 
45
  ## "Wait...but you called this a frankenMoE?"
46
+ The difference between MoE and "frankenMoE" lies in the fact that the router layer in a model like the one on this repo is not trained simultaneously. There are rumors about someone developing a way for us to unscuff these frankenMoE models by training the router layer simultaneously. This model seems to overcome that.