Byte not found in vocab
I've been trying to convert gguf to run it on Raspberry pi 5.
I have used two different approaches in llama.cpp conversion:
- create vocab.json and merges.txt using hf Tokenizer > create extended tokenizer.model > convert gguf
- use modified conversion script from this PR: https://github.com/ggerganov/llama.cpp/pull/3633 > create gguf out of only tokenizer.json
I have tried different quantizations (Q5_0, Q4_K_M, Q4_0). For some reason all approaches end to the same result: when I try to load the model, I get error 'Byte not found in vocab'. Do you have any idea, what this could be related? The original model is kind of working when using Transformers, but is is way too slow for RPi.
Hi, have you done that tokenizer.model ? Could you share it? I'm trying to make gguf-file too, but as far I understand it needs that tokenizer.model that has been removed from this repo. Or can you tell where is a quide to create it?
EDIT: I got the quantazion to work without tokenizer.model. I used that repo 3633 you linked. The problem was that I was trying to use it with llama.cpp in oogabooga, but it worked llamacpp_HF instead.
EDIT 2: Correction, it needs tokenizer.model to run it with llamacpp_HF. But it seems to work somehow with some faulty tokenizer.model in the same folder. Also important step was:
update the gguf filetype to current if older version is unsupported by another application
./quantize ./models/7B/ggml-model-q4_0.gguf ./models/7B/ggml-model-q4_0-v2.gguf COPY
Cool, you actually got it working! I will have to try again, I am not sure if I did the version conversion you mentioned in the end.
FYI tokenizer.model conversion was done using instructions from here: https://github.com/huggingface/tokenizers/issues/521
Yesterday I tested the conversions again with my laptop. I also installed oogabooga's webui to PC - it indeed works on Windows using llamacpp_HF both with GPU and CPU only configs.
I don't get it why the exactly same model won't work on Linux/aarch64. Shouldn't be memory issue, because other 3B models are working great.
Oh well, I will have to keep experimenting