Kquant03 commited on
Commit
c49e0a2
·
verified ·
1 Parent(s): 109f406

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -43,4 +43,4 @@ If all our tokens are sent to just a few popular experts, that will make trainin
43
 
44
 
45
  ## "Wait...but you called this a frankenMoE?"
46
- The difference between MoE and "frankenMoE" lies in the fact that the router layer in a model like the one on this repo is not trained simultaneously. There are rumors about someone developing a way for us to unscuff these frankenMoE models by training the router layer simultaneously. I believe this model performs well despite these shortcomings.
 
43
 
44
 
45
  ## "Wait...but you called this a frankenMoE?"
46
+ The difference between MoE and "frankenMoE" lies in the fact that the router layer in a model like the one on this repo is not trained simultaneously. There are rumors about someone developing a way for us to unscuff these frankenMoE models by training the router layer simultaneously.