THUDM/GLM-4-9B-0414 · I get too many repetitions

JLouisBiz

3 days ago

I'm using the quantized model Q8 with llama.cpp and I still get too many repetitions

MrDevolver

3 days ago

I'm using the quantized model Q8 with llama.cpp and I still get too many repetitions

Unfortunately, this model currently doesn't work well with llama.cpp. 😢

JLouisBiz

3 days ago

Does it work well without directly?

zRzRzRzRzRzRzR

Z.ai & THUKEG org 3 days ago

The reason we haven't released the quantized model is also because we encountered serious loss issues after quantization. We are looking into how to solve this. Currently, directly using the quantized model in llama cpp will result in serious performance loss and cannot complete basic tasks.

MrDevolver

2 days ago

•

edited 2 days ago

The reason we haven't released the quantized model is also because we encountered serious loss issues after quantization. We are looking into how to solve this. Currently, directly using the quantized model in llama cpp will result in serious performance loss and cannot complete basic tasks.

I fell in love with your models long time ago, they are great models, but they are like forbidden fruit for me, because I cannot use them without proper GGUF support. 😢

If you could please spare some time assisting those who are working on GGUF inference engines such as llamacpp with implementing proper support for your models, please do so. I would appreciate it very much and I'm sure many others would do as well! ❤

I absolutely love your screenshots with the content your models can generate. They are absolutely lovely and stunning, full of extra detail I would not expect to get with such simple prompts! I'd also like to thank you for publishing the prompts that were used to generate it. With those prompts I was able to test various different models on lmarena for comparison. This is my favorite "Create a misty Jiangnan scene using SVG." and I was very impressed by the output of your model:

It may be using simple shapes, but overall the image is beautiful and detailed. When I tested the same prompt with much bigger commercial models, they either failed completely or the generated images were not as detailed and pretty as the one generated by your model.

For example, this is from o3-mini, using the same prompt:

I think your model would be a real gem, a real star among the models for local inference, if we could only use it in llamacpp, which is the only way I can personally run these models.

zRzRzRzRzRzRzR

Z.ai & THUKEG org 2 days ago

We have received a large number of suggestions from quantitative analysis, and we are coordinating with staff to try to have them complete the calibration quantification within a certain period of time, especially for the 32B model. However, I still don't know how long it will take.