Totally unusable from branch 4bit 32g (screenshots included)

#15

by anon7463435254 - opened Jul 24, 2023

Discussion

anon7463435254

Jul 24, 2023

All the responses are like the following:

Why is that?

Thanks.

TheBloke

Owner Jul 24, 2023

Hmm yeah you're right. AutoGPTQ is producing gibberish with this file.

In any case I would recommend you use ExLlama as the Loader, as it will be much faster than AutoGPTQ. And it works fine with this file, I just tested it.

But I need to investigate why AutoGPTQ cannot do inference from this file, and I will report that as a bug.

TheBloke

Owner Jul 24, 2023

It's a bug in AutoGPTQ 0.3.0

If you really want to use AutoGPTQ for some reason, please downgrade to AutoGPTQ 0.2.2 and it will work - but it will be slow.

I will report this as a bug in AutoGPTQ but I don't know when it might be fixed

TheBloke

Owner Jul 24, 2023

So, to summarise:

I recommend you use ExLlama anyway, as it is faster
If you really want to use AutoGPTQ, downgrade to 0.2.2
I have raised this as a bug in 0.3.0, which you can track here: https://github.com/PanQiWei/AutoGPTQ/issues/201

anon7463435254

Jul 25, 2023

Thank you very much, man. I also found a possible bug using the ggml files. Hoping to help, I'm gonna open a discussion on the 13B-chat-ggml.

anon7463435254 changed discussion status to closed Jul 25, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment