No it does not include the XS, the reason Q4_0 and IQ4_NL work i think is because they don't do any clever packing of the scaling factors, that's why K quants and IQ4_XS (which is like NL but with some K quant logic) don't work yet
Bartowski PRO
bartowski
AI & ML interests
Official model curator for https://lmstudio.ai/
Recent Activity
replied to
their
post
about 4 hours ago
Looks like Q4_0_N_M file types are going away
Before you panic, there's a new "preferred" method which is online (I prefer the term on-the-fly) repacking, so if you download Q4_0 and your setup can benefit from repacking the weights into interleaved rows (what Q4_0_4_4 was doing), it will do that automatically and give you similar performance (minor losses I think due to using intrinsics instead of assembly, but intrinsics are more maintainable)
You can see the reference PR here:
https://github.com/ggerganov/llama.cpp/pull/10446
So if you update your llama.cpp past that point, you won't be able to run Q4_0_4_4 (unless they add backwards compatibility back), but Q4_0 should be the same speeds (though it may currently be bugged on some platforms)
As such, I'll stop making those newer model formats soon, probably end of this week unless something changes, but you should be safe to download and Q4_0 quants and use those !
Also IQ4_NL supports repacking though not in as many shapes yet, but should get a respectable speed up on ARM chips, PR for that can be found here: https://github.com/ggerganov/llama.cpp/pull/10541
Remember, these are not meant for Apple silicon since those use the GPU and don't benefit from the repacking of weights
new activity
about 13 hours ago
Qwen/QVQ-72B-Preview:GGUF weights?
updated
a model
about 22 hours ago
bartowski/QVQ-72B-Preview-GGUF
Organizations
bartowski's activity
replied to
their
post
about 4 hours ago
GGUF weights?
7
#1 opened 1 day ago
by
luijait
bartowski/Tulu-MathLingo-8B-GGUF
Text Generation
•
Updated
•
486
•
1
bartowski/Llama-3.1-8B-Open-SFT-GGUF
Text Generation
•
Updated
•
465
bartowski/Llama-3.1-8B-Open-SFT-GGUF
Text Generation
•
Updated
•
465
bartowski/moxin-llm-7b-GGUF
Text Generation
•
Updated
•
308
bartowski/moxin-llm-7b-GGUF
Text Generation
•
Updated
•
308
bartowski/QVQ-72B-Preview-GGUF
Image-Text-to-Text
•
Updated
•
2.76k
•
22