Text Generation
Transformers
Safetensors
llama
conversational
text-generation-inference
Inference Endpoints

Incorrect vocab size?

#2
by claudiuv - opened

First of all, the model is really neat, thank you for sharing!
Second, I tried to convert the model to the gguf format using the script from llama.cpp https://github.com/ggerganov/llama.cpp/blob/master/convert.py and got an error about vocab size:

Exception: Vocab size mismatch (model has 32256, but Magicoder-S-DS-6.7B/tokenizer.model combined with Magicoder-S-DS-6.7B/added_tokens.json has 32022)

Is it possible that you updated tokenizer.model and added_tokens.json, but forgot to update config.json? It seems that if I set the vocab_size value in config.json to 32022 from the original value of 32256, the conversion takes place, but I'm not sure if this breaks anything.
Any answer or tip would be highly appreciated.

Intellligent Software Engineering (iSE) org

Thanks for your interest in Magicoder! Magicoder-S-DS-6.7B is based on deepseek-coder-6.7b-base so the tokenizer configs and the model config should be identical. I did some quick search and found similar issues: https://huggingface.co./TheBloke/deepseek-coder-33B-instruct-GGUF/discussions/2#654a04eb8fde27109bda19c1. Let me quote the response here:

This is not an error, just an info message which can be ignored. The same message is printed by llama.cpp and it has no impact that I've noticed

So I guess you can safely ignore the warning.

Thank you very much for the helpful reply @yuxiang630 !

@yuxiang630 , may I ask when the vocab size is inconsistent, 32256 vs. 32022?

I tried ignoring the vocab warning but I still can't get this to convert to GGUF. Their CL version works fine, it's just this DeepSeek one.

Intellligent Software Engineering (iSE) org

I tried ignoring the vocab warning but I still can't get this to convert to GGUF. Their CL version works fine, it's just this DeepSeek one.

Hi @lawls , did you try if the base deepseek-coder 6.7B can be converted successfully, or is it a problem specific to Magicoder-DS?

@yuxiang630 theirs doesn't work either. when I convert using llama.cpp, I get notified about the wrong vocab size

python convert.py /Users/lawls/Development/models/Magicoder-S-DS-6.7B/
Loading model file /Users/lawls/Development/models/Magicoder-S-DS-6.7B/model-00001-of-00006.safetensors
Loading model file /Users/lawls/Development/models/Magicoder-S-DS-6.7B/model-00001-of-00006.safetensors
Loading model file /Users/lawls/Development/models/Magicoder-S-DS-6.7B/model-00002-of-00006.safetensors
Loading model file /Users/lawls/Development/models/Magicoder-S-DS-6.7B/model-00003-of-00006.safetensors
Loading model file /Users/lawls/Development/models/Magicoder-S-DS-6.7B/model-00004-of-00006.safetensors
Loading model file /Users/lawls/Development/models/Magicoder-S-DS-6.7B/model-00005-of-00006.safetensors
Loading model file /Users/lawls/Development/models/Magicoder-S-DS-6.7B/model-00006-of-00006.safetensors
params = Params(n_vocab=32256, n_embd=4096, n_layer=32, n_ctx=16384, n_ff=11008, n_head=32, n_head_kv=32, n_experts=None, n_experts_used=None, f_norm_eps=1e-06, rope_scaling_type=<RopeScalingType.LINEAR: 'linear'>, f_rope_freq_base=100000, f_rope_scale=4.0, n_orig_ctx=None, rope_finetuned=None, ftype=None, path_model=PosixPath('/Users/lawls/Development/models/Magicoder-S-DS-6.7B'))
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
32016 32000
Vocab info: <VocabLoader with 32000 base tokens and 22 added tokens>
Special vocab info: <SpecialVocab with 31757 merges, special tokens {'bos': 32013, 'eos': 32014, 'pad': 32014}, add special tokens {'bos': True, 'eos': False}>
Permuting layer 0
Permuting layer 1
Permuting layer 2
Permuting layer 3
Permuting layer 4
Permuting layer 5
Permuting layer 6
Permuting layer 7
Permuting layer 8
Permuting layer 9
Permuting layer 10
Permuting layer 11
Permuting layer 12
Permuting layer 13
Permuting layer 14
Permuting layer 15
Permuting layer 16
Permuting layer 17
Permuting layer 18
Permuting layer 19
Permuting layer 20
Permuting layer 21
Permuting layer 22
Permuting layer 23
Permuting layer 24
Permuting layer 25
Permuting layer 26
Permuting layer 27
Permuting layer 28
Permuting layer 29
Permuting layer 30
Permuting layer 31
model.embed_tokens.weight                        -> token_embd.weight                        | F32    | [32256, 4096]
model.layers.0.input_layernorm.weight            -> blk.0.attn_norm.weight                   | F32    | [4096]
model.layers.0.mlp.down_proj.weight              -> blk.0.ffn_down.weight                    | F32    | [4096, 11008]
model.layers.0.mlp.gate_proj.weight              -> blk.0.ffn_gate.weight                    | F32    | [11008, 4096]
model.layers.0.mlp.up_proj.weight                -> blk.0.ffn_up.weight                      | F32    | [11008, 4096]
model.layers.0.post_attention_layernorm.weight   -> blk.0.ffn_norm.weight                    | F32    | [4096]
model.layers.0.self_attn.k_proj.weight           -> blk.0.attn_k.weight                      | F32    | [4096, 4096]
model.layers.0.self_attn.o_proj.weight           -> blk.0.attn_output.weight                 | F32    | [4096, 4096]
model.layers.0.self_attn.q_proj.weight           -> blk.0.attn_q.weight                      | F32    | [4096, 4096]
model.layers.0.self_attn.v_proj.weight           -> blk.0.attn_v.weight                      | F32    | [4096, 4096]
model.layers.1.input_layernorm.weight            -> blk.1.attn_norm.weight                   | F32    | [4096]
model.layers.1.mlp.down_proj.weight              -> blk.1.ffn_down.weight                    | F32    | [4096, 11008]
model.layers.1.mlp.gate_proj.weight              -> blk.1.ffn_gate.weight                    | F32    | [11008, 4096]
model.layers.1.mlp.up_proj.weight                -> blk.1.ffn_up.weight                      | F32    | [11008, 4096]
model.layers.1.post_attention_layernorm.weight   -> blk.1.ffn_norm.weight                    | F32    | [4096]
model.layers.1.self_attn.k_proj.weight           -> blk.1.attn_k.weight                      | F32    | [4096, 4096]
model.layers.1.self_attn.o_proj.weight           -> blk.1.attn_output.weight                 | F32    | [4096, 4096]
model.layers.1.self_attn.q_proj.weight           -> blk.1.attn_q.weight                      | F32    | [4096, 4096]
model.layers.1.self_attn.v_proj.weight           -> blk.1.attn_v.weight                      | F32    | [4096, 4096]
model.layers.2.input_layernorm.weight            -> blk.2.attn_norm.weight                   | F32    | [4096]
model.layers.2.mlp.down_proj.weight              -> blk.2.ffn_down.weight                    | F32    | [4096, 11008]
model.layers.2.mlp.gate_proj.weight              -> blk.2.ffn_gate.weight                    | F32    | [11008, 4096]
model.layers.2.mlp.up_proj.weight                -> blk.2.ffn_up.weight                      | F32    | [11008, 4096]
model.layers.2.post_attention_layernorm.weight   -> blk.2.ffn_norm.weight                    | F32    | [4096]
model.layers.2.self_attn.k_proj.weight           -> blk.2.attn_k.weight                      | F32    | [4096, 4096]
model.layers.2.self_attn.o_proj.weight           -> blk.2.attn_output.weight                 | F32    | [4096, 4096]
model.layers.2.self_attn.q_proj.weight           -> blk.2.attn_q.weight                      | F32    | [4096, 4096]
model.layers.2.self_attn.v_proj.weight           -> blk.2.attn_v.weight                      | F32    | [4096, 4096]
model.layers.3.input_layernorm.weight            -> blk.3.attn_norm.weight                   | F32    | [4096]
model.layers.3.mlp.down_proj.weight              -> blk.3.ffn_down.weight                    | F32    | [4096, 11008]
model.layers.3.mlp.gate_proj.weight              -> blk.3.ffn_gate.weight                    | F32    | [11008, 4096]
model.layers.3.mlp.up_proj.weight                -> blk.3.ffn_up.weight                      | F32    | [11008, 4096]
model.layers.3.post_attention_layernorm.weight   -> blk.3.ffn_norm.weight                    | F32    | [4096]
model.layers.3.self_attn.k_proj.weight           -> blk.3.attn_k.weight                      | F32    | [4096, 4096]
model.layers.3.self_attn.o_proj.weight           -> blk.3.attn_output.weight                 | F32    | [4096, 4096]
model.layers.3.self_attn.q_proj.weight           -> blk.3.attn_q.weight                      | F32    | [4096, 4096]
model.layers.3.self_attn.v_proj.weight           -> blk.3.attn_v.weight                      | F32    | [4096, 4096]
model.layers.4.input_layernorm.weight            -> blk.4.attn_norm.weight                   | F32    | [4096]
model.layers.4.mlp.down_proj.weight              -> blk.4.ffn_down.weight                    | F32    | [4096, 11008]
model.layers.4.mlp.gate_proj.weight              -> blk.4.ffn_gate.weight                    | F32    | [11008, 4096]
model.layers.4.mlp.up_proj.weight                -> blk.4.ffn_up.weight                      | F32    | [11008, 4096]
model.layers.4.post_attention_layernorm.weight   -> blk.4.ffn_norm.weight                    | F32    | [4096]
model.layers.4.self_attn.k_proj.weight           -> blk.4.attn_k.weight                      | F32    | [4096, 4096]
model.layers.4.self_attn.o_proj.weight           -> blk.4.attn_output.weight                 | F32    | [4096, 4096]
model.layers.4.self_attn.q_proj.weight           -> blk.4.attn_q.weight                      | F32    | [4096, 4096]
model.layers.4.self_attn.v_proj.weight           -> blk.4.attn_v.weight                      | F32    | [4096, 4096]
model.layers.5.self_attn.k_proj.weight           -> blk.5.attn_k.weight                      | F32    | [4096, 4096]
model.layers.5.self_attn.o_proj.weight           -> blk.5.attn_output.weight                 | F32    | [4096, 4096]
model.layers.5.self_attn.q_proj.weight           -> blk.5.attn_q.weight                      | F32    | [4096, 4096]
model.layers.5.self_attn.v_proj.weight           -> blk.5.attn_v.weight                      | F32    | [4096, 4096]
model.layers.10.input_layernorm.weight           -> blk.10.attn_norm.weight                  | F32    | [4096]
model.layers.10.mlp.down_proj.weight             -> blk.10.ffn_down.weight                   | F32    | [4096, 11008]
model.layers.10.mlp.gate_proj.weight             -> blk.10.ffn_gate.weight                   | F32    | [11008, 4096]
model.layers.10.mlp.up_proj.weight               -> blk.10.ffn_up.weight                     | F32    | [11008, 4096]
model.layers.10.post_attention_layernorm.weight  -> blk.10.ffn_norm.weight                   | F32    | [4096]
model.layers.10.self_attn.k_proj.weight          -> blk.10.attn_k.weight                     | F32    | [4096, 4096]
model.layers.10.self_attn.o_proj.weight          -> blk.10.attn_output.weight                | F32    | [4096, 4096]
model.layers.10.self_attn.q_proj.weight          -> blk.10.attn_q.weight                     | F32    | [4096, 4096]
model.layers.10.self_attn.v_proj.weight          -> blk.10.attn_v.weight                     | F32    | [4096, 4096]
model.layers.11.self_attn.k_proj.weight          -> blk.11.attn_k.weight                     | F32    | [4096, 4096]
model.layers.11.self_attn.o_proj.weight          -> blk.11.attn_output.weight                | F32    | [4096, 4096]
model.layers.11.self_attn.q_proj.weight          -> blk.11.attn_q.weight                     | F32    | [4096, 4096]
model.layers.11.self_attn.v_proj.weight          -> blk.11.attn_v.weight                     | F32    | [4096, 4096]
model.layers.5.input_layernorm.weight            -> blk.5.attn_norm.weight                   | F32    | [4096]
model.layers.5.mlp.down_proj.weight              -> blk.5.ffn_down.weight                    | F32    | [4096, 11008]
model.layers.5.mlp.gate_proj.weight              -> blk.5.ffn_gate.weight                    | F32    | [11008, 4096]
model.layers.5.mlp.up_proj.weight                -> blk.5.ffn_up.weight                      | F32    | [11008, 4096]
model.layers.5.post_attention_layernorm.weight   -> blk.5.ffn_norm.weight                    | F32    | [4096]
model.layers.6.input_layernorm.weight            -> blk.6.attn_norm.weight                   | F32    | [4096]
model.layers.6.mlp.down_proj.weight              -> blk.6.ffn_down.weight                    | F32    | [4096, 11008]
model.layers.6.mlp.gate_proj.weight              -> blk.6.ffn_gate.weight                    | F32    | [11008, 4096]
model.layers.6.mlp.up_proj.weight                -> blk.6.ffn_up.weight                      | F32    | [11008, 4096]
model.layers.6.post_attention_layernorm.weight   -> blk.6.ffn_norm.weight                    | F32    | [4096]
model.layers.6.self_attn.k_proj.weight           -> blk.6.attn_k.weight                      | F32    | [4096, 4096]
model.layers.6.self_attn.o_proj.weight           -> blk.6.attn_output.weight                 | F32    | [4096, 4096]
model.layers.6.self_attn.q_proj.weight           -> blk.6.attn_q.weight                      | F32    | [4096, 4096]
model.layers.6.self_attn.v_proj.weight           -> blk.6.attn_v.weight                      | F32    | [4096, 4096]
model.layers.7.input_layernorm.weight            -> blk.7.attn_norm.weight                   | F32    | [4096]
model.layers.7.mlp.down_proj.weight              -> blk.7.ffn_down.weight                    | F32    | [4096, 11008]
model.layers.7.mlp.gate_proj.weight              -> blk.7.ffn_gate.weight                    | F32    | [11008, 4096]
model.layers.7.mlp.up_proj.weight                -> blk.7.ffn_up.weight                      | F32    | [11008, 4096]
model.layers.7.post_attention_layernorm.weight   -> blk.7.ffn_norm.weight                    | F32    | [4096]
model.layers.7.self_attn.k_proj.weight           -> blk.7.attn_k.weight                      | F32    | [4096, 4096]
model.layers.7.self_attn.o_proj.weight           -> blk.7.attn_output.weight                 | F32    | [4096, 4096]
model.layers.7.self_attn.q_proj.weight           -> blk.7.attn_q.weight                      | F32    | [4096, 4096]
model.layers.7.self_attn.v_proj.weight           -> blk.7.attn_v.weight                      | F32    | [4096, 4096]
model.layers.8.input_layernorm.weight            -> blk.8.attn_norm.weight                   | F32    | [4096]
model.layers.8.mlp.down_proj.weight              -> blk.8.ffn_down.weight                    | F32    | [4096, 11008]
model.layers.8.mlp.gate_proj.weight              -> blk.8.ffn_gate.weight                    | F32    | [11008, 4096]
model.layers.8.mlp.up_proj.weight                -> blk.8.ffn_up.weight                      | F32    | [11008, 4096]
model.layers.8.post_attention_layernorm.weight   -> blk.8.ffn_norm.weight                    | F32    | [4096]
model.layers.8.self_attn.k_proj.weight           -> blk.8.attn_k.weight                      | F32    | [4096, 4096]
model.layers.8.self_attn.o_proj.weight           -> blk.8.attn_output.weight                 | F32    | [4096, 4096]
model.layers.8.self_attn.q_proj.weight           -> blk.8.attn_q.weight                      | F32    | [4096, 4096]
model.layers.8.self_attn.v_proj.weight           -> blk.8.attn_v.weight                      | F32    | [4096, 4096]
model.layers.9.input_layernorm.weight            -> blk.9.attn_norm.weight                   | F32    | [4096]
model.layers.9.mlp.down_proj.weight              -> blk.9.ffn_down.weight                    | F32    | [4096, 11008]
model.layers.9.mlp.gate_proj.weight              -> blk.9.ffn_gate.weight                    | F32    | [11008, 4096]
model.layers.9.mlp.up_proj.weight                -> blk.9.ffn_up.weight                      | F32    | [11008, 4096]
model.layers.9.post_attention_layernorm.weight   -> blk.9.ffn_norm.weight                    | F32    | [4096]
model.layers.9.self_attn.k_proj.weight           -> blk.9.attn_k.weight                      | F32    | [4096, 4096]
model.layers.9.self_attn.o_proj.weight           -> blk.9.attn_output.weight                 | F32    | [4096, 4096]
model.layers.9.self_attn.q_proj.weight           -> blk.9.attn_q.weight                      | F32    | [4096, 4096]
model.layers.9.self_attn.v_proj.weight           -> blk.9.attn_v.weight                      | F32    | [4096, 4096]
model.layers.11.input_layernorm.weight           -> blk.11.attn_norm.weight                  | F32    | [4096]
model.layers.11.mlp.down_proj.weight             -> blk.11.ffn_down.weight                   | F32    | [4096, 11008]
model.layers.11.mlp.gate_proj.weight             -> blk.11.ffn_gate.weight                   | F32    | [11008, 4096]
model.layers.11.mlp.up_proj.weight               -> blk.11.ffn_up.weight                     | F32    | [11008, 4096]
model.layers.11.post_attention_layernorm.weight  -> blk.11.ffn_norm.weight                   | F32    | [4096]
model.layers.12.input_layernorm.weight           -> blk.12.attn_norm.weight                  | F32    | [4096]
model.layers.12.mlp.down_proj.weight             -> blk.12.ffn_down.weight                   | F32    | [4096, 11008]
model.layers.12.mlp.gate_proj.weight             -> blk.12.ffn_gate.weight                   | F32    | [11008, 4096]
model.layers.12.mlp.up_proj.weight               -> blk.12.ffn_up.weight                     | F32    | [11008, 4096]
model.layers.12.post_attention_layernorm.weight  -> blk.12.ffn_norm.weight                   | F32    | [4096]
model.layers.12.self_attn.k_proj.weight          -> blk.12.attn_k.weight                     | F32    | [4096, 4096]
model.layers.12.self_attn.o_proj.weight          -> blk.12.attn_output.weight                | F32    | [4096, 4096]
model.layers.12.self_attn.q_proj.weight          -> blk.12.attn_q.weight                     | F32    | [4096, 4096]
model.layers.12.self_attn.v_proj.weight          -> blk.12.attn_v.weight                     | F32    | [4096, 4096]
model.layers.13.input_layernorm.weight           -> blk.13.attn_norm.weight                  | F32    | [4096]
model.layers.13.mlp.down_proj.weight             -> blk.13.ffn_down.weight                   | F32    | [4096, 11008]
model.layers.13.mlp.gate_proj.weight             -> blk.13.ffn_gate.weight                   | F32    | [11008, 4096]
model.layers.13.mlp.up_proj.weight               -> blk.13.ffn_up.weight                     | F32    | [11008, 4096]
model.layers.13.post_attention_layernorm.weight  -> blk.13.ffn_norm.weight                   | F32    | [4096]
model.layers.13.self_attn.k_proj.weight          -> blk.13.attn_k.weight                     | F32    | [4096, 4096]
model.layers.13.self_attn.o_proj.weight          -> blk.13.attn_output.weight                | F32    | [4096, 4096]
model.layers.13.self_attn.q_proj.weight          -> blk.13.attn_q.weight                     | F32    | [4096, 4096]
model.layers.13.self_attn.v_proj.weight          -> blk.13.attn_v.weight                     | F32    | [4096, 4096]
model.layers.14.input_layernorm.weight           -> blk.14.attn_norm.weight                  | F32    | [4096]
model.layers.14.mlp.down_proj.weight             -> blk.14.ffn_down.weight                   | F32    | [4096, 11008]
model.layers.14.mlp.gate_proj.weight             -> blk.14.ffn_gate.weight                   | F32    | [11008, 4096]
model.layers.14.mlp.up_proj.weight               -> blk.14.ffn_up.weight                     | F32    | [11008, 4096]
model.layers.14.post_attention_layernorm.weight  -> blk.14.ffn_norm.weight                   | F32    | [4096]
model.layers.14.self_attn.k_proj.weight          -> blk.14.attn_k.weight                     | F32    | [4096, 4096]
model.layers.14.self_attn.o_proj.weight          -> blk.14.attn_output.weight                | F32    | [4096, 4096]
model.layers.14.self_attn.q_proj.weight          -> blk.14.attn_q.weight                     | F32    | [4096, 4096]
model.layers.14.self_attn.v_proj.weight          -> blk.14.attn_v.weight                     | F32    | [4096, 4096]
model.layers.15.input_layernorm.weight           -> blk.15.attn_norm.weight                  | F32    | [4096]
model.layers.15.mlp.down_proj.weight             -> blk.15.ffn_down.weight                   | F32    | [4096, 11008]
model.layers.15.mlp.gate_proj.weight             -> blk.15.ffn_gate.weight                   | F32    | [11008, 4096]
model.layers.15.mlp.up_proj.weight               -> blk.15.ffn_up.weight                     | F32    | [11008, 4096]
model.layers.15.post_attention_layernorm.weight  -> blk.15.ffn_norm.weight                   | F32    | [4096]
model.layers.15.self_attn.k_proj.weight          -> blk.15.attn_k.weight                     | F32    | [4096, 4096]
model.layers.15.self_attn.o_proj.weight          -> blk.15.attn_output.weight                | F32    | [4096, 4096]
model.layers.15.self_attn.q_proj.weight          -> blk.15.attn_q.weight                     | F32    | [4096, 4096]
model.layers.15.self_attn.v_proj.weight          -> blk.15.attn_v.weight                     | F32    | [4096, 4096]
model.layers.16.input_layernorm.weight           -> blk.16.attn_norm.weight                  | F32    | [4096]
model.layers.16.mlp.down_proj.weight             -> blk.16.ffn_down.weight                   | F32    | [4096, 11008]
model.layers.16.mlp.gate_proj.weight             -> blk.16.ffn_gate.weight                   | F32    | [11008, 4096]
model.layers.16.mlp.up_proj.weight               -> blk.16.ffn_up.weight                     | F32    | [11008, 4096]
model.layers.16.post_attention_layernorm.weight  -> blk.16.ffn_norm.weight                   | F32    | [4096]
model.layers.16.self_attn.k_proj.weight          -> blk.16.attn_k.weight                     | F32    | [4096, 4096]
model.layers.16.self_attn.o_proj.weight          -> blk.16.attn_output.weight                | F32    | [4096, 4096]
model.layers.16.self_attn.q_proj.weight          -> blk.16.attn_q.weight                     | F32    | [4096, 4096]
model.layers.16.self_attn.v_proj.weight          -> blk.16.attn_v.weight                     | F32    | [4096, 4096]
model.layers.17.self_attn.k_proj.weight          -> blk.17.attn_k.weight                     | F32    | [4096, 4096]
model.layers.17.self_attn.o_proj.weight          -> blk.17.attn_output.weight                | F32    | [4096, 4096]
model.layers.17.self_attn.q_proj.weight          -> blk.17.attn_q.weight                     | F32    | [4096, 4096]
model.layers.17.self_attn.v_proj.weight          -> blk.17.attn_v.weight                     | F32    | [4096, 4096]
model.layers.17.input_layernorm.weight           -> blk.17.attn_norm.weight                  | F32    | [4096]
model.layers.17.mlp.down_proj.weight             -> blk.17.ffn_down.weight                   | F32    | [4096, 11008]
model.layers.17.mlp.gate_proj.weight             -> blk.17.ffn_gate.weight                   | F32    | [11008, 4096]
model.layers.17.mlp.up_proj.weight               -> blk.17.ffn_up.weight                     | F32    | [11008, 4096]
model.layers.17.post_attention_layernorm.weight  -> blk.17.ffn_norm.weight                   | F32    | [4096]
model.layers.18.input_layernorm.weight           -> blk.18.attn_norm.weight                  | F32    | [4096]
model.layers.18.mlp.down_proj.weight             -> blk.18.ffn_down.weight                   | F32    | [4096, 11008]
model.layers.18.mlp.gate_proj.weight             -> blk.18.ffn_gate.weight                   | F32    | [11008, 4096]
model.layers.18.mlp.up_proj.weight               -> blk.18.ffn_up.weight                     | F32    | [11008, 4096]
model.layers.18.post_attention_layernorm.weight  -> blk.18.ffn_norm.weight                   | F32    | [4096]
model.layers.18.self_attn.k_proj.weight          -> blk.18.attn_k.weight                     | F32    | [4096, 4096]
model.layers.18.self_attn.o_proj.weight          -> blk.18.attn_output.weight                | F32    | [4096, 4096]
model.layers.18.self_attn.q_proj.weight          -> blk.18.attn_q.weight                     | F32    | [4096, 4096]
model.layers.18.self_attn.v_proj.weight          -> blk.18.attn_v.weight                     | F32    | [4096, 4096]
model.layers.19.input_layernorm.weight           -> blk.19.attn_norm.weight                  | F32    | [4096]
model.layers.19.mlp.down_proj.weight             -> blk.19.ffn_down.weight                   | F32    | [4096, 11008]
model.layers.19.mlp.gate_proj.weight             -> blk.19.ffn_gate.weight                   | F32    | [11008, 4096]
model.layers.19.mlp.up_proj.weight               -> blk.19.ffn_up.weight                     | F32    | [11008, 4096]
model.layers.19.post_attention_layernorm.weight  -> blk.19.ffn_norm.weight                   | F32    | [4096]
model.layers.19.self_attn.k_proj.weight          -> blk.19.attn_k.weight                     | F32    | [4096, 4096]
model.layers.19.self_attn.o_proj.weight          -> blk.19.attn_output.weight                | F32    | [4096, 4096]
model.layers.19.self_attn.q_proj.weight          -> blk.19.attn_q.weight                     | F32    | [4096, 4096]
model.layers.19.self_attn.v_proj.weight          -> blk.19.attn_v.weight                     | F32    | [4096, 4096]
model.layers.20.input_layernorm.weight           -> blk.20.attn_norm.weight                  | F32    | [4096]
model.layers.20.mlp.down_proj.weight             -> blk.20.ffn_down.weight                   | F32    | [4096, 11008]
model.layers.20.mlp.gate_proj.weight             -> blk.20.ffn_gate.weight                   | F32    | [11008, 4096]
model.layers.20.mlp.up_proj.weight               -> blk.20.ffn_up.weight                     | F32    | [11008, 4096]
model.layers.20.post_attention_layernorm.weight  -> blk.20.ffn_norm.weight                   | F32    | [4096]
model.layers.20.self_attn.k_proj.weight          -> blk.20.attn_k.weight                     | F32    | [4096, 4096]
model.layers.20.self_attn.o_proj.weight          -> blk.20.attn_output.weight                | F32    | [4096, 4096]
model.layers.20.self_attn.q_proj.weight          -> blk.20.attn_q.weight                     | F32    | [4096, 4096]
model.layers.20.self_attn.v_proj.weight          -> blk.20.attn_v.weight                     | F32    | [4096, 4096]
model.layers.21.input_layernorm.weight           -> blk.21.attn_norm.weight                  | F32    | [4096]
model.layers.21.mlp.down_proj.weight             -> blk.21.ffn_down.weight                   | F32    | [4096, 11008]
model.layers.21.mlp.gate_proj.weight             -> blk.21.ffn_gate.weight                   | F32    | [11008, 4096]
model.layers.21.mlp.up_proj.weight               -> blk.21.ffn_up.weight                     | F32    | [11008, 4096]
model.layers.21.post_attention_layernorm.weight  -> blk.21.ffn_norm.weight                   | F32    | [4096]
model.layers.21.self_attn.k_proj.weight          -> blk.21.attn_k.weight                     | F32    | [4096, 4096]
model.layers.21.self_attn.o_proj.weight          -> blk.21.attn_output.weight                | F32    | [4096, 4096]
model.layers.21.self_attn.q_proj.weight          -> blk.21.attn_q.weight                     | F32    | [4096, 4096]
model.layers.21.self_attn.v_proj.weight          -> blk.21.attn_v.weight                     | F32    | [4096, 4096]
model.layers.22.input_layernorm.weight           -> blk.22.attn_norm.weight                  | F32    | [4096]
model.layers.22.mlp.down_proj.weight             -> blk.22.ffn_down.weight                   | F32    | [4096, 11008]
model.layers.22.mlp.gate_proj.weight             -> blk.22.ffn_gate.weight                   | F32    | [11008, 4096]
model.layers.22.mlp.up_proj.weight               -> blk.22.ffn_up.weight                     | F32    | [11008, 4096]
model.layers.22.post_attention_layernorm.weight  -> blk.22.ffn_norm.weight                   | F32    | [4096]
model.layers.22.self_attn.k_proj.weight          -> blk.22.attn_k.weight                     | F32    | [4096, 4096]
model.layers.22.self_attn.o_proj.weight          -> blk.22.attn_output.weight                | F32    | [4096, 4096]
model.layers.22.self_attn.q_proj.weight          -> blk.22.attn_q.weight                     | F32    | [4096, 4096]
model.layers.22.self_attn.v_proj.weight          -> blk.22.attn_v.weight                     | F32    | [4096, 4096]
model.layers.23.self_attn.k_proj.weight          -> blk.23.attn_k.weight                     | F32    | [4096, 4096]
model.layers.23.self_attn.o_proj.weight          -> blk.23.attn_output.weight                | F32    | [4096, 4096]
model.layers.23.self_attn.q_proj.weight          -> blk.23.attn_q.weight                     | F32    | [4096, 4096]
model.layers.23.self_attn.v_proj.weight          -> blk.23.attn_v.weight                     | F32    | [4096, 4096]
model.layers.23.input_layernorm.weight           -> blk.23.attn_norm.weight                  | F32    | [4096]
model.layers.23.mlp.down_proj.weight             -> blk.23.ffn_down.weight                   | F32    | [4096, 11008]
model.layers.23.mlp.gate_proj.weight             -> blk.23.ffn_gate.weight                   | F32    | [11008, 4096]
model.layers.23.mlp.up_proj.weight               -> blk.23.ffn_up.weight                     | F32    | [11008, 4096]
model.layers.23.post_attention_layernorm.weight  -> blk.23.ffn_norm.weight                   | F32    | [4096]
model.layers.24.input_layernorm.weight           -> blk.24.attn_norm.weight                  | F32    | [4096]
model.layers.24.mlp.down_proj.weight             -> blk.24.ffn_down.weight                   | F32    | [4096, 11008]
model.layers.24.mlp.gate_proj.weight             -> blk.24.ffn_gate.weight                   | F32    | [11008, 4096]
model.layers.24.mlp.up_proj.weight               -> blk.24.ffn_up.weight                     | F32    | [11008, 4096]
model.layers.24.post_attention_layernorm.weight  -> blk.24.ffn_norm.weight                   | F32    | [4096]
model.layers.24.self_attn.k_proj.weight          -> blk.24.attn_k.weight                     | F32    | [4096, 4096]
model.layers.24.self_attn.o_proj.weight          -> blk.24.attn_output.weight                | F32    | [4096, 4096]
model.layers.24.self_attn.q_proj.weight          -> blk.24.attn_q.weight                     | F32    | [4096, 4096]
model.layers.24.self_attn.v_proj.weight          -> blk.24.attn_v.weight                     | F32    | [4096, 4096]
model.layers.25.input_layernorm.weight           -> blk.25.attn_norm.weight                  | F32    | [4096]
model.layers.25.mlp.down_proj.weight             -> blk.25.ffn_down.weight                   | F32    | [4096, 11008]
model.layers.25.mlp.gate_proj.weight             -> blk.25.ffn_gate.weight                   | F32    | [11008, 4096]
model.layers.25.mlp.up_proj.weight               -> blk.25.ffn_up.weight                     | F32    | [11008, 4096]
model.layers.25.post_attention_layernorm.weight  -> blk.25.ffn_norm.weight                   | F32    | [4096]
model.layers.25.self_attn.k_proj.weight          -> blk.25.attn_k.weight                     | F32    | [4096, 4096]
model.layers.25.self_attn.o_proj.weight          -> blk.25.attn_output.weight                | F32    | [4096, 4096]
model.layers.25.self_attn.q_proj.weight          -> blk.25.attn_q.weight                     | F32    | [4096, 4096]
model.layers.25.self_attn.v_proj.weight          -> blk.25.attn_v.weight                     | F32    | [4096, 4096]
model.layers.26.input_layernorm.weight           -> blk.26.attn_norm.weight                  | F32    | [4096]
model.layers.26.mlp.down_proj.weight             -> blk.26.ffn_down.weight                   | F32    | [4096, 11008]
model.layers.26.mlp.gate_proj.weight             -> blk.26.ffn_gate.weight                   | F32    | [11008, 4096]
model.layers.26.mlp.up_proj.weight               -> blk.26.ffn_up.weight                     | F32    | [11008, 4096]
model.layers.26.post_attention_layernorm.weight  -> blk.26.ffn_norm.weight                   | F32    | [4096]
model.layers.26.self_attn.k_proj.weight          -> blk.26.attn_k.weight                     | F32    | [4096, 4096]
model.layers.26.self_attn.o_proj.weight          -> blk.26.attn_output.weight                | F32    | [4096, 4096]
model.layers.26.self_attn.q_proj.weight          -> blk.26.attn_q.weight                     | F32    | [4096, 4096]
model.layers.26.self_attn.v_proj.weight          -> blk.26.attn_v.weight                     | F32    | [4096, 4096]
model.layers.27.input_layernorm.weight           -> blk.27.attn_norm.weight                  | F32    | [4096]
model.layers.27.mlp.down_proj.weight             -> blk.27.ffn_down.weight                   | F32    | [4096, 11008]
model.layers.27.mlp.gate_proj.weight             -> blk.27.ffn_gate.weight                   | F32    | [11008, 4096]
model.layers.27.mlp.up_proj.weight               -> blk.27.ffn_up.weight                     | F32    | [11008, 4096]
model.layers.27.post_attention_layernorm.weight  -> blk.27.ffn_norm.weight                   | F32    | [4096]
model.layers.27.self_attn.k_proj.weight          -> blk.27.attn_k.weight                     | F32    | [4096, 4096]
model.layers.27.self_attn.o_proj.weight          -> blk.27.attn_output.weight                | F32    | [4096, 4096]
model.layers.27.self_attn.q_proj.weight          -> blk.27.attn_q.weight                     | F32    | [4096, 4096]
model.layers.27.self_attn.v_proj.weight          -> blk.27.attn_v.weight                     | F32    | [4096, 4096]
model.layers.28.input_layernorm.weight           -> blk.28.attn_norm.weight                  | F32    | [4096]
model.layers.28.mlp.down_proj.weight             -> blk.28.ffn_down.weight                   | F32    | [4096, 11008]
model.layers.28.mlp.gate_proj.weight             -> blk.28.ffn_gate.weight                   | F32    | [11008, 4096]
model.layers.28.mlp.up_proj.weight               -> blk.28.ffn_up.weight                     | F32    | [11008, 4096]
model.layers.28.post_attention_layernorm.weight  -> blk.28.ffn_norm.weight                   | F32    | [4096]
model.layers.28.self_attn.k_proj.weight          -> blk.28.attn_k.weight                     | F32    | [4096, 4096]
model.layers.28.self_attn.o_proj.weight          -> blk.28.attn_output.weight                | F32    | [4096, 4096]
model.layers.28.self_attn.q_proj.weight          -> blk.28.attn_q.weight                     | F32    | [4096, 4096]
model.layers.28.self_attn.v_proj.weight          -> blk.28.attn_v.weight                     | F32    | [4096, 4096]
model.layers.29.self_attn.k_proj.weight          -> blk.29.attn_k.weight                     | F32    | [4096, 4096]
model.layers.29.self_attn.o_proj.weight          -> blk.29.attn_output.weight                | F32    | [4096, 4096]
model.layers.29.self_attn.q_proj.weight          -> blk.29.attn_q.weight                     | F32    | [4096, 4096]
model.layers.29.self_attn.v_proj.weight          -> blk.29.attn_v.weight                     | F32    | [4096, 4096]
lm_head.weight                                   -> output.weight                            | F32    | [32256, 4096]
model.layers.29.input_layernorm.weight           -> blk.29.attn_norm.weight                  | F32    | [4096]
model.layers.29.mlp.down_proj.weight             -> blk.29.ffn_down.weight                   | F32    | [4096, 11008]
model.layers.29.mlp.gate_proj.weight             -> blk.29.ffn_gate.weight                   | F32    | [11008, 4096]
model.layers.29.mlp.up_proj.weight               -> blk.29.ffn_up.weight                     | F32    | [11008, 4096]
model.layers.29.post_attention_layernorm.weight  -> blk.29.ffn_norm.weight                   | F32    | [4096]
model.layers.30.input_layernorm.weight           -> blk.30.attn_norm.weight                  | F32    | [4096]
model.layers.30.mlp.down_proj.weight             -> blk.30.ffn_down.weight                   | F32    | [4096, 11008]
model.layers.30.mlp.gate_proj.weight             -> blk.30.ffn_gate.weight                   | F32    | [11008, 4096]
model.layers.30.mlp.up_proj.weight               -> blk.30.ffn_up.weight                     | F32    | [11008, 4096]
model.layers.30.post_attention_layernorm.weight  -> blk.30.ffn_norm.weight                   | F32    | [4096]
model.layers.30.self_attn.k_proj.weight          -> blk.30.attn_k.weight                     | F32    | [4096, 4096]
model.layers.30.self_attn.o_proj.weight          -> blk.30.attn_output.weight                | F32    | [4096, 4096]
model.layers.30.self_attn.q_proj.weight          -> blk.30.attn_q.weight                     | F32    | [4096, 4096]
model.layers.30.self_attn.v_proj.weight          -> blk.30.attn_v.weight                     | F32    | [4096, 4096]
model.layers.31.input_layernorm.weight           -> blk.31.attn_norm.weight                  | F32    | [4096]
model.layers.31.mlp.down_proj.weight             -> blk.31.ffn_down.weight                   | F32    | [4096, 11008]
model.layers.31.mlp.gate_proj.weight             -> blk.31.ffn_gate.weight                   | F32    | [11008, 4096]
model.layers.31.mlp.up_proj.weight               -> blk.31.ffn_up.weight                     | F32    | [11008, 4096]
model.layers.31.post_attention_layernorm.weight  -> blk.31.ffn_norm.weight                   | F32    | [4096]
model.layers.31.self_attn.k_proj.weight          -> blk.31.attn_k.weight                     | F32    | [4096, 4096]
model.layers.31.self_attn.o_proj.weight          -> blk.31.attn_output.weight                | F32    | [4096, 4096]
model.layers.31.self_attn.q_proj.weight          -> blk.31.attn_q.weight                     | F32    | [4096, 4096]
model.layers.31.self_attn.v_proj.weight          -> blk.31.attn_v.weight                     | F32    | [4096, 4096]
model.norm.weight                                -> output_norm.weight                       | F32    | [4096]
Writing /Users/lawls/Development/models/Magicoder-S-DS-6.7B/ggml-model-f32.gguf, format 0
Traceback (most recent call last):
  File "/Users/lawls/Development/python/llama.cpp/convert.py", line 1279, in <module>
    main()
  File "/Users/lawls/Development/python/llama.cpp/convert.py", line 1273, in main
    OutputFile.write_all(outfile, ftype, params, model, vocab, special_vocab,
  File "/Users/lawls/Development/python/llama.cpp/convert.py", line 988, in write_all
    check_vocab_size(params, vocab, pad_vocab = pad_vocab)
  File "/Users/lawls/Development/python/llama.cpp/convert.py", line 860, in check_vocab_size
    raise Exception(msg)
Exception: Vocab size mismatch (model has 32256, but /Users/lawls/Development/models/Magicoder-S-DS-6.7B has 32022). Possibly try using the --padvocab option.

Using the --padvocab option produces a .gguf file. But whenever it try to load it, I get this error. I do not have this issue with the deepseek-coder-6.7b-base. Beyond this, I have no idea what I am doing or what I would even do.

./server -m /Users/lawls/Development/models/Magicoder-S-DS-6.7B/ggml-model-f32.gguf --mlock
{"timestamp":1703169999,"level":"INFO","function":"main","line":2668,"message":"build info","build":1663,"commit":"799fc22"}
{"timestamp":1703169999,"level":"INFO","function":"main","line":2675,"message":"system info","n_threads":12,"n_threads_batch":-1,"total_threads":16,"system_info":"AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | "}
llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from /Users/lawls/Development/models/Magicoder-S-DS-6.7B/ggml-model-f32.gguf (version GGUF V3 (latest))
llama_model_loader: - tensor    0:                token_embd.weight f32      [  4096, 32256,     1,     1 ]
llama_model_loader: - tensor    1:           blk.0.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor    2:            blk.0.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor    3:            blk.0.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor    4:              blk.0.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor    5:            blk.0.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor    6:              blk.0.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    7:         blk.0.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    8:              blk.0.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor    9:              blk.0.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   10:           blk.1.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   11:            blk.1.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   12:            blk.1.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   13:              blk.1.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   14:            blk.1.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   15:              blk.1.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   16:         blk.1.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   17:              blk.1.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   18:              blk.1.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   19:           blk.2.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   20:            blk.2.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   21:            blk.2.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   22:              blk.2.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   23:            blk.2.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   24:              blk.2.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   25:         blk.2.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   26:              blk.2.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   27:              blk.2.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   28:           blk.3.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   29:            blk.3.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   30:            blk.3.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   31:              blk.3.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   32:            blk.3.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   33:              blk.3.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   34:         blk.3.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   35:              blk.3.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   36:              blk.3.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   37:           blk.4.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   38:            blk.4.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   39:            blk.4.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   40:              blk.4.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   41:            blk.4.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   42:              blk.4.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   43:         blk.4.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   44:              blk.4.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   45:              blk.4.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   46:              blk.5.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   47:         blk.5.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   48:              blk.5.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   49:              blk.5.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   50:          blk.10.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   51:           blk.10.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   52:           blk.10.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   53:             blk.10.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   54:           blk.10.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   55:             blk.10.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   56:        blk.10.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   57:             blk.10.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   58:             blk.10.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   59:             blk.11.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   60:        blk.11.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   61:             blk.11.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   62:             blk.11.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   63:           blk.5.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   64:            blk.5.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   65:            blk.5.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   66:              blk.5.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   67:            blk.5.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   68:           blk.6.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   69:            blk.6.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   70:            blk.6.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   71:              blk.6.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   72:            blk.6.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   73:              blk.6.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   74:         blk.6.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   75:              blk.6.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   76:              blk.6.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   77:           blk.7.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   78:            blk.7.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   79:            blk.7.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   80:              blk.7.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   81:            blk.7.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   82:              blk.7.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   83:         blk.7.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   84:              blk.7.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   85:              blk.7.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   86:           blk.8.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   87:            blk.8.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   88:            blk.8.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   89:              blk.8.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   90:            blk.8.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   91:              blk.8.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   92:         blk.8.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   93:              blk.8.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   94:              blk.8.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor   95:           blk.9.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor   96:            blk.9.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor   97:            blk.9.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   98:              blk.9.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor   99:            blk.9.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  100:              blk.9.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  101:         blk.9.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  102:              blk.9.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  103:              blk.9.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  104:          blk.11.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  105:           blk.11.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  106:           blk.11.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  107:             blk.11.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  108:           blk.11.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  109:          blk.12.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  110:           blk.12.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  111:           blk.12.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  112:             blk.12.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  113:           blk.12.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  114:             blk.12.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  115:        blk.12.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  116:             blk.12.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  117:             blk.12.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  118:          blk.13.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  119:           blk.13.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  120:           blk.13.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  121:             blk.13.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  122:           blk.13.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  123:             blk.13.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  124:        blk.13.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  125:             blk.13.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  126:             blk.13.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  127:          blk.14.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  128:           blk.14.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  129:           blk.14.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  130:             blk.14.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  131:           blk.14.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  132:             blk.14.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  133:        blk.14.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  134:             blk.14.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  135:             blk.14.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  136:          blk.15.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  137:           blk.15.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  138:           blk.15.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  139:             blk.15.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  140:           blk.15.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  141:             blk.15.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  142:        blk.15.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  143:             blk.15.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  144:             blk.15.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  145:          blk.16.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  146:           blk.16.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  147:           blk.16.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  148:             blk.16.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  149:           blk.16.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  150:             blk.16.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  151:        blk.16.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  152:             blk.16.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  153:             blk.16.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  154:             blk.17.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  155:        blk.17.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  156:             blk.17.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  157:             blk.17.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  158:          blk.17.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  159:           blk.17.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  160:           blk.17.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  161:             blk.17.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  162:           blk.17.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  163:          blk.18.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  164:           blk.18.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  165:           blk.18.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  166:             blk.18.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  167:           blk.18.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  168:             blk.18.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  169:        blk.18.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  170:             blk.18.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  171:             blk.18.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  172:          blk.19.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  173:           blk.19.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  174:           blk.19.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  175:             blk.19.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  176:           blk.19.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  177:             blk.19.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  178:        blk.19.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  179:             blk.19.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  180:             blk.19.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  181:          blk.20.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  182:           blk.20.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  183:           blk.20.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  184:             blk.20.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  185:           blk.20.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  186:             blk.20.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  187:        blk.20.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  188:             blk.20.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  189:             blk.20.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  190:          blk.21.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  191:           blk.21.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  192:           blk.21.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  193:             blk.21.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  194:           blk.21.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  195:             blk.21.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  196:        blk.21.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  197:             blk.21.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  198:             blk.21.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  199:          blk.22.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  200:           blk.22.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  201:           blk.22.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  202:             blk.22.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  203:           blk.22.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  204:             blk.22.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  205:        blk.22.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  206:             blk.22.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  207:             blk.22.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  208:             blk.23.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  209:        blk.23.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  210:             blk.23.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  211:             blk.23.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  212:          blk.23.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  213:           blk.23.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  214:           blk.23.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  215:             blk.23.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  216:           blk.23.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  217:          blk.24.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  218:           blk.24.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  219:           blk.24.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  220:             blk.24.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  221:           blk.24.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  222:             blk.24.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  223:        blk.24.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  224:             blk.24.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  225:             blk.24.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  226:          blk.25.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  227:           blk.25.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  228:           blk.25.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  229:             blk.25.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  230:           blk.25.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  231:             blk.25.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  232:        blk.25.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  233:             blk.25.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  234:             blk.25.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  235:          blk.26.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  236:           blk.26.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  237:           blk.26.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  238:             blk.26.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  239:           blk.26.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  240:             blk.26.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  241:        blk.26.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  242:             blk.26.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  243:             blk.26.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  244:          blk.27.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  245:           blk.27.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  246:           blk.27.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  247:             blk.27.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  248:           blk.27.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  249:             blk.27.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  250:        blk.27.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  251:             blk.27.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  252:             blk.27.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  253:          blk.28.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  254:           blk.28.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  255:           blk.28.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  256:             blk.28.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  257:           blk.28.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  258:             blk.28.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  259:        blk.28.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  260:             blk.28.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  261:             blk.28.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  262:             blk.29.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  263:        blk.29.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  264:             blk.29.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  265:             blk.29.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  266:                    output.weight f32      [  4096, 32256,     1,     1 ]
llama_model_loader: - tensor  267:          blk.29.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  268:           blk.29.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  269:           blk.29.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  270:             blk.29.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  271:           blk.29.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  272:          blk.30.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  273:           blk.30.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  274:           blk.30.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  275:             blk.30.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  276:           blk.30.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  277:             blk.30.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  278:        blk.30.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  279:             blk.30.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  280:             blk.30.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  281:          blk.31.attn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  282:           blk.31.ffn_down.weight f32      [ 11008,  4096,     1,     1 ]
llama_model_loader: - tensor  283:           blk.31.ffn_gate.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  284:             blk.31.ffn_up.weight f32      [  4096, 11008,     1,     1 ]
llama_model_loader: - tensor  285:           blk.31.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: - tensor  286:             blk.31.attn_k.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  287:        blk.31.attn_output.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  288:             blk.31.attn_q.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  289:             blk.31.attn_v.weight f32      [  4096,  4096,     1,     1 ]
llama_model_loader: - tensor  290:               output_norm.weight f32      [  4096,     1,     1,     1 ]
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = models
llama_model_loader: - kv   2:                       llama.context_length u32              = 16384
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 32
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 100000.000000
llama_model_loader: - kv  11:                    llama.rope.scaling.type str              = linear
llama_model_loader: - kv  12:                  llama.rope.scaling.factor f32              = 4.000000
llama_model_loader: - kv  13:                          general.file_type u32              = 0
llama_model_loader: - kv  14:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  15:                      tokenizer.ggml.tokens arr[str,32256]   = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  16:                      tokenizer.ggml.scores arr[f32,32256]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  17:                  tokenizer.ggml.token_type arr[i32,32256]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  18:                      tokenizer.ggml.merges arr[str,31757]   = ["Ġ Ġ", "Ġ t", "Ġ a", "i n", "h e...
llama_model_loader: - kv  19:                tokenizer.ggml.bos_token_id u32              = 32013
llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 32014
llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 32014
llama_model_loader: - kv  22:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  23:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - type  f32:  291 tensors
error loading model: unordered_map::at: key not found
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/Users/lawls/Development/models/Magicoder-S-DS-6.7B/ggml-model-f32.gguf'
{"timestamp":1703169999,"level":"ERROR","function":"load_model","line":581,"message":"unable to load model","model":"/Users/lawls/Development/models/Magicoder-S-DS-6.7B/ggml-model-f32.gguf"}

Yeah, --pad-vocab doesn't help the situation: while it coverts fine, on inferring the model generates garbage. TheBloke's earlier qunats didn't work either - llama.cpp exits with a vocab related error. matthoffner/Magicoder-S-DS-6.7B-GGUF quants worked for me.

Sign up or log in to comment