Totally unusable from branch 4bit 32g (screenshots included)
#15
by
anon7463435254
- opened
Hmm yeah you're right. AutoGPTQ is producing gibberish with this file.
In any case I would recommend you use ExLlama as the Loader, as it will be much faster than AutoGPTQ. And it works fine with this file, I just tested it.
But I need to investigate why AutoGPTQ cannot do inference from this file, and I will report that as a bug.
It's a bug in AutoGPTQ 0.3.0
If you really want to use AutoGPTQ for some reason, please downgrade to AutoGPTQ 0.2.2 and it will work - but it will be slow.
I will report this as a bug in AutoGPTQ but I don't know when it might be fixed
So, to summarise:
- I recommend you use ExLlama anyway, as it is faster
- If you really want to use AutoGPTQ, downgrade to 0.2.2
- I have raised this as a bug in 0.3.0, which you can track here: https://github.com/PanQiWei/AutoGPTQ/issues/201
Thank you very much, man. I also found a possible bug using the ggml files. Hoping to help, I'm gonna open a discussion on the 13B-chat-ggml.
anon7463435254
changed discussion status to
closed