Update README.md
Browse files
README.md
CHANGED
@@ -43,4 +43,4 @@ If all our tokens are sent to just a few popular experts, that will make trainin
|
|
43 |
|
44 |
|
45 |
## "Wait...but you called this a frankenMoE?"
|
46 |
-
The difference between MoE and "frankenMoE" lies in the fact that the router layer in a model like the one on this repo is not trained simultaneously. There are rumors about someone developing a way for us to unscuff these frankenMoE models by training the router layer simultaneously.
|
|
|
43 |
|
44 |
|
45 |
## "Wait...but you called this a frankenMoE?"
|
46 |
+
The difference between MoE and "frankenMoE" lies in the fact that the router layer in a model like the one on this repo is not trained simultaneously. There are rumors about someone developing a way for us to unscuff these frankenMoE models by training the router layer simultaneously.
|