Reconverted and requantized with latest GGUF to fix llama3 tokenizer

#5

Hi, I've taken the liberty to requantize your model to fix the llama3 tokenizer.

Feel free to use these files, or replace them with your own.

Thanks! appreciate that, Will look at it haven't had time to make sure all ok

I've found out a bug with the GGUF conversion. This might affect all fine tuned models and GGUF from bfloat16 that has been converted to GGUF.

https://github.com/ggerganov/llama.cpp/issues/7062

Great catch @Orenguteng , there have been quite some changes lately. This might indeed affect the quants if CUDA was active.

@algorithm The issue is that it will probably affect it even with CPU, but to a much lesser degree due to bfloat16->float16 conversion, I've noticed that it specifically affects lora fine tuning mostly.

@algorithm https://www.reddit.com/r/LocalLLaMA/comments/1ckvx9l/part2_confirmed_possible_bug_llama3_gguf/

It seems GGUF's are broken. This is huge. Not about CPU or GPU, it's regardless. AWQ tested in 4-bit produces correct outcome, something in GGUF is broken and llama.cpp.

@Orenguteng Very interesting, I agree this is a big deal and yes it's regardless of CPU or GPU. I'm keeping an eye on the github as we speak. I hope they'll narrow down the problem. Thanks for letting me know!

Has the issue been fixed? Is it safe to download the model now? (Noob question) Whats the difference with this compared to the original besides being GGUF and supposedly uncensored?

I too am wondering about this, as there is so much high-level discussion that's way above my head! I just want me some sweet, uncensored gguf of L3 but all this talk of quanting prompts and stop strings and wotnot just gives me a headache - and I'm not alone!

We're relying on you @Orenguteng ! *puppy dog eyes

@dadadies the Issue has been closed, it seems https://huggingface.co./meta-llama/Meta-Llama-3-8B-Instruct/tree/main updated their tokenizer 18 hours ago, and there's still some issues. You can safely download this and use it as you wish, until a better version will release with fixed tokenizer etc. This one was an early release and works good enough but will become better.

Orenguteng changed pull request status to closed

Sign up or log in to comment