trashpanda-org/QwQ-32B-Snowdrop-v0 · This model is a godsend.

Q4 performs very well, however, i highly recommend running Q5 if you can, even if you have to offload. It has noticeably better word flow compared to Q4 and is a bit better at extracting meaning from facts. For example, where Q4 might list a character's traits in the thinking stage, effectively repeating the context, and just extrapolate a generalization, Q5 is more likely to instead immediately write what the character's traits imply for the current scene, bypassing the redundant word munching and avoiding generalization. Of course, once the thinking is done, it doesn't really matter what was in there, all we care about is the resulting message, but what i described translates to the message part as well.

That said, if neither Q5 nor Q4 are feasible for you, IQ3_XS is still worth running, surprisingly it doesn't feel that much worse compared to Q4, the prose is rougher and the thinking a bit shallower, but still the Thinking makes it punch above the typical weight of a Q3 compared to models that can't properly use it.

Q6 didn't feel any different compared to Q5 but take it with a grain of salt. I think it's not worth the extra size this time.

Haven't bothered to try Q8. If you're thinking about Q8, you probably have the means to just run it and not care about the other quants anyway.