Great job, really nice!
So first of all, big congrats. The creative and text output is quite different from the typical Llamas I get, and excellent. Love the rich variety in vocabulary and descriptions. I can only fit the IQ4_XS model on my laptop, but 'tis plenty fine. What is MOE, and what are 'power levels'?
I actually have a standard creative query for the models, which I use to gauge their linguistic skills and general creativity. This model passed with flying colors.
Excellent. thank you!
A MOE (Mixture of Experts) is roughly speaking a collection of models all (or not) working together. The config I used is set to use 4 of the Dark Planet models in this model. Power levels refers to lowering or raising the number of models contributing to generation. For this model specifically, this means bringing more 8B Dark Planet models online (or off) to a maximum of 8... equal to 64B parameters.
If I use fewer than the full 8 models, will this reduce the memory requirements in practice?
This will increase tokens/second speed ; as there is a literally less processing per token happening.
Roughly if you are getting 40 tokens per second @ 4 experts, 8 experts will be around 10-15 t/s.
However, the entire model is still loaded in VRAM regardless of experts used/activated.