tokenizer.model file

#10

by hanisaf - opened Sep 13, 2023

Sep 13, 2023

I'm trying to convert the model to GGML. The tokenizer.model file is not included. Using LLAMA 2 tokenizer.model results in an error "Expected added token IDs to be sequential" I appreciate pointing to the tokenizer.model file

guyko81

Sep 14, 2023

isn't it the original model?
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1_5")
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1_5")

hanisaf

Sep 15, 2023

correct, but I don't know how to save it into the tokenizer.model file required by the convert.py script.

alpindale

Sep 18, 2023

correct, but I don't know how to save it into the tokenizer.model file required by the convert.py script.

This is not a llama model, so you cannot use llama.cpp or ggml to convert it yet. You will have to add support for MixFormerSequentialForCausalLM model type in the ggml library first.

hanisaf

Sep 18, 2023

Thanks a lot for this information and sorry for the newbie question. I appreciate any link or tutorial for doing so. A Google search led me to this page https://github.com/guidance-ai/guidance/issues/58 but I could not get much out of it.

alpindale

Sep 21, 2023

Thanks a lot for this information and sorry for the newbie question. I appreciate any link or tutorial for doing so. A Google search led me to this page https://github.com/guidance-ai/guidance/issues/58 but I could not get much out of it.

It won't be easy porting the model to GGML, but not impossible. You can take a look at how GPT-J is implemented here (refer to the HF implementation here), then try and adapt the Phi model the same way. The modeling code for phi is in this repo.

gugarosa changed discussion status to closed Nov 21, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment