Bartowski PRO
bartowski
AI & ML interests
Official model curator for https://lmstudio.ai/
Recent Activity
replied to
their
post
about 5 hours ago
Looks like Q4_0_N_M file types are going away
Before you panic, there's a new "preferred" method which is online (I prefer the term on-the-fly) repacking, so if you download Q4_0 and your setup can benefit from repacking the weights into interleaved rows (what Q4_0_4_4 was doing), it will do that automatically and give you similar performance (minor losses I think due to using intrinsics instead of assembly, but intrinsics are more maintainable)
You can see the reference PR here:
https://github.com/ggerganov/llama.cpp/pull/10446
So if you update your llama.cpp past that point, you won't be able to run Q4_0_4_4 (unless they add backwards compatibility back), but Q4_0 should be the same speeds (though it may currently be bugged on some platforms)
As such, I'll stop making those newer model formats soon, probably end of this week unless something changes, but you should be safe to download and Q4_0 quants and use those !
Also IQ4_NL supports repacking though not in as many shapes yet, but should get a respectable speed up on ARM chips, PR for that can be found here: https://github.com/ggerganov/llama.cpp/pull/10541
Remember, these are not meant for Apple silicon since those use the GPU and don't benefit from the repacking of weights
new activity
about 13 hours ago
Qwen/QVQ-72B-Preview:GGUF weights?
updated
a model
about 22 hours ago
bartowski/QVQ-72B-Preview-GGUF
Organizations
bartowski's activity
GGUF weights?
7
#1 opened 1 day ago
by
luijait
Question on target-language quantization
1
#1 opened 6 days ago
by
robbiemu
Quant_Request
1
#2 opened 5 days ago
by
Cran-May
Question about the example run command
1
#3 opened 4 days ago
by
elifeinberg
Is it possible?
2
#1 opened 2 days ago
by
KatyTheCutie
Request to bring back Q4_1
38
#299 opened 3 months ago
by
yttria
Llama 3.1 lumi maid crashing Kobold
6
#29 opened 14 days ago
by
Keionsa
Tokenizer issues
1
#1 opened 8 days ago
by
eleius
Extra `<s>` in EOG and EOS.
3
#1 opened 7 days ago
by
notafraud
qwen2.5-coder-14b-instruct@iq4_xs Quantized Model Fails to Follow Instructions on LMStudio
1
#1 opened 4 days ago
by
anrgct
Problem in running with vllm
1
#4 opened 4 days ago
by
babakgh
"Supports a context length of 160k through yarn settings."
1
#1 opened 6 days ago
by
mclassHF2023
Quant Request
2
#1 opened 7 days ago
by
mt114514
Upload mmproj-Qwen2-VL-2B-Instruct-f16.gguf
3
#1 opened 10 days ago
by
stduhpf
GGUF
2
#1 opened 9 days ago
by
MikeLightheart
"Error: llama runner process has terminated", when running "ollama run"
2
#2 opened 9 days ago
by
yuguanggao
gguf
2
#4 opened 4 months ago
by
goodasdgood
Missing Tensors in Q5_K_S + Q4_K_M
7
#3 opened 13 days ago
by
Joan8652
Quant Request
2
#1 opened 12 days ago
by
Cran-May