create_tensor: tensor 'blk.0.ffn_gate.weight' not found
Hi!
I have the following error loading the mixtral-8x7b-v0.1.Q8_0.gguf:
llm_load_tensors: ggml ctx size = 0.32 MB
llm_load_tensors: using CUDA for GPU acceleration
ggml_cuda_set_main_device: using device 0 (NVIDIA GeForce RTX 3090) as main device
error loading model: create_tensor: tensor 'blk.0.ffn_gate.weight' not found
llama_load_model_from_file: failed to load model
2023-12-11 16:55:48 ERROR:Failed to load the model.
...
File "/env/lib/python3.10/site-packages/llama_cpp_cuda/llama.py", line 365, in init
assert self.model is not None
AssertionError
Any idea?
PD: TheBloke, many many thanks for your work and time!
You need to use the llama.cpp fork with mixtral support https://github.com/ggerganov/llama.cpp/tree/mixtral
Is there not a Windows 10 binary compiled for this???
Hi, I'm using the right branch (latest pull from mixtral) but still getting the same error:
llm_load_tensors: ggml ctx size = 0.36 MiB
llm_load_tensors: using CUDA for GPU acceleration
error loading model: create_tensor: tensor 'blk.0.ffn_gate.weight' not found
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/mnt/e/mixtral-8x7b-v0.1.Q4_K_M.gguf'
main: error: unable to load model
In the list of layers produced by the llama _model_loader from running main, I don't see this tensor. I only see tensors like blk.0.ffn_gate.0.weight
Am I missing anything here?
I had the same issue on Apple Metal M2 MAX and it was solved by pulling the /mixtral branch instead of the master branch from llama.cpp then remaking. But you're right that the supposedly missing tensor doesn't appear in the list of created tensors even when it works!
I tried a clean build multiple times but still no luck. Should the mixtral branch work as is or are there any additional changes or patches that are required? Any help is greatly appreciated.
llm_load_tensors: ggml ctx size = 0.36 MiB
llm_load_tensors: using CUDA for GPU acceleration
error loading model: create_tensor: tensor 'blk.0.ffn_gate.weight' not found
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/mnt/models/mixtral-8x7b-v0.1.Q4_K_M.gguf'
main: error: unable to load model
.../llama.cpp/build$ git status
On branch mixtral
nothing to commit, working tree clean
Final update: I got it working eventually. For some reason building from the branch wasn't working originally but after a few tries it works and I can load the models correctly.
You need to use the llama.cpp fork with mixtral support https://github.com/ggerganov/llama.cpp/tree/mixtral
Can you please explain more, how do we do it ?
How do I make it use the newly downloaded llama.cpp (I also think we don't need to use the branch for that anymore since it was merged I believe) Where do I install it?
- Clone the repo as per normal:
git clone https://github.com/ggerganov/llama.cpp
- Before you build, run
git checkout mixtral
- Build as per normal