Most of the time, this model doesn't work for me. Please help!
Hello,
I have issues using this model in inference UI apps like Faraday and LM Studio. Most of the time the model just doesn't work. Initially it looks promising, it loads into the RAM and VRAM, CPU starts processing, but then after a moment it all stops and the model unloads from both RAM and VRAM and nothing is generated. Yesterday I was able to briefly load and even use the model in LM Studio for a while, but after some time it stopped working again and kept giving me errors about failing model.
I've read there were some problems with the quantization process of the model as discussed here: https://huggingface.co./NousResearch/Nous-Hermes-Llama2-13b/discussions/1 I don't know if my problem has anything to do with that, but I'd really like to have a working version of this model.
I discussed this issue with another user who tested the model for me on his own hardware, although he has different specs (Intel + Nvidia, my own specs are at the bottom of the post) and he was able to use the model. I'm starting to feel like I'm the only one having this issue and it feels a bit ridiculous, because even though I normally use all kinds of different 13B GGML models, I just can't get this one to work for some reason. Is there anything I could do to fix the problem, please? Any help would be appreciated, thanks!
My specs:
OS: Windows 10 64bit
CPU: AMD Ryzen 2700x
RAM: 16 GB
GPU: AMD Radeon RX Vega 56
VRAM: 8 GB
Model:
Nous-Hermes-Llama2-13b-GGML (Q4_K_M version)
same for me, doesn't work at all. gives incoherent answers most of the time
same for me, doesn't work at all. gives incoherent answers most of the time
My issue is different. It unloads after a while and doesn't even begin to generate anything. Source of your issue is most likely completely different. Check your settings and system prompt to see if you're using the one suggested.