Kquant03
/

Ryu-4x7B-MoE-bf16

Text Generation

Mixture of Experts

text-generation-inference

Model card Files Files and versions Community

Kquant03 commited on Jan 13, 2024

Commit

26e93b9

·

verified ·

1 Parent(s): 1f6f14a

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -6,7 +6,7 @@ tags:
 - merge
 ---
 ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6589d7e6586088fd2784a12c/JGAuV97AOR2bCKbR2iNPc.jpeg)
-# Borked MoE :(
 A merge of [fblgit/UNA-TheBeagle-7b-v1](https://huggingface.co/fblgit/UNA-TheBeagle-7b-v1), [Open-Orca/Mistral-7B-OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca), [samir-fama/FernandoGPT-v1](https://huggingface.co/samir-fama/FernandoGPT-v1) and [Neuronovo/neuronovo-7B-v0.3](https://huggingface.co/Neuronovo/neuronovo-7B-v0.3).
@@ -43,4 +43,4 @@ If all our tokens are sent to just a few popular experts, that will make trainin
 ## "Wait...but you called this a frankenMoE?"
-The difference between MoE and "frankenMoE" lies in the fact that the router layer in a model like the one on this repo is not trained simultaneously. There are rumors about someone developing a way for us to unscuff these frankenMoE models by training the router layer simultaneously.

 - merge
 ---
 ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6589d7e6586088fd2784a12c/JGAuV97AOR2bCKbR2iNPc.jpeg)
+# Intuition sharp as a blade
 A merge of [fblgit/UNA-TheBeagle-7b-v1](https://huggingface.co/fblgit/UNA-TheBeagle-7b-v1), [Open-Orca/Mistral-7B-OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca), [samir-fama/FernandoGPT-v1](https://huggingface.co/samir-fama/FernandoGPT-v1) and [Neuronovo/neuronovo-7B-v0.3](https://huggingface.co/Neuronovo/neuronovo-7B-v0.3).
 ## "Wait...but you called this a frankenMoE?"
+The difference between MoE and "frankenMoE" lies in the fact that the router layer in a model like the one on this repo is not trained simultaneously. There are rumors about someone developing a way for us to unscuff these frankenMoE models by training the router layer simultaneously. This model seems to overcome that.