tokenizer.model file
I'm trying to convert the model to GGML. The tokenizer.model
file is not included. Using LLAMA 2 tokenizer.model results in an error "Expected added token IDs to be sequential" I appreciate pointing to the tokenizer.model
file
isn't it the original model?
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1_5")
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1_5")
correct, but I don't know how to save it into the tokenizer.model
file required by the convert.py script.
Thanks a lot for this information and sorry for the newbie question. I appreciate any link or tutorial for doing so. A Google search led me to this page https://github.com/guidance-ai/guidance/issues/58 but I could not get much out of it.
Thanks a lot for this information and sorry for the newbie question. I appreciate any link or tutorial for doing so. A Google search led me to this page https://github.com/guidance-ai/guidance/issues/58 but I could not get much out of it.
It won't be easy porting the model to GGML, but not impossible. You can take a look at how GPT-J is implemented here (refer to the HF implementation here), then try and adapt the Phi model the same way. The modeling code for phi is in this repo.