lelapa/InkubaLM-0.4B · The current checkpoint doesn't use group query attention.

about 1 month ago

•

When I tried to load the model using:

llm = AutoModelForCausalLM.from_pretrained("lelapa/InkubaLM-0.4B",
                                       torch_dtype=torch.float16)

I encountered the following error:

RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
    size mismatch for model.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([2048, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
    size mismatch for model.layers.0.self_attn.v_proj.weight: copying a param with shape torch.Size([2048, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
    size mismatch for model.layers.1.self_attn.k_proj.weight: copying a param with shape torch.Size([2048, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
    size mismatch for model.layers.1.self_attn.v_proj.weight: copying a param with shape torch.Size([2048, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
    size mismatch for model.layers.2.self_attn.k_proj.weight: copying a param with shape torch.Size([2048, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
    size mismatch for model.layers.2.self_attn.v_proj.weight: copying a param with shape torch.Size([2048, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
    size mismatch for model.layers.3.self_attn.k_proj.weight: copying a param with shape torch.Size([2048, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
    size mismatch for model.layers.3.self_attn.v_proj.weight: copying a param with shape torch.Size([2048, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
    size mismatch for model.layers.4.self_attn.k_proj.weight: copying a param with shape torch.Size([2048, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
    size mismatch for model.layers.4.self_attn.v_proj.weight: copying a param with shape torch.Size([2048, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
    size mismatch for model.layers.5.self_attn.k_proj.weight: copying a param with shape torch.Size([2048, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
    size mismatch for model.layers.5.self_attn.v_proj.weight: copying a param with shape torch.Size([2048, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
    size mismatch for model.layers.6.self_attn.k_proj.weight: copying a param with shape torch.Size([2048, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
    size mismatch for model.layers.6.self_attn.v_proj.weight: copying a param with shape torch.Size([2048, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
    size mismatch for model.layers.7.self_attn.k_proj.weight: copying a param with shape torch.Size([2048, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
    size mismatch for model.layers.7.self_attn.v_proj.weight: copying a param with shape torch.Size([2048, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
    You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.

This error suggests that the current checkpoint uses a standard Multi-Head Attention instead of Group Query Attention, as the k and v matrices are square. To fix this issue, I modified the config.json file by setting num_key_value_heads = num_attention_heads = 32.
This is the purpose of this pull request.

The actual checkpoint doesn't use group query attention.a820272b

yaya-sy changed pull request title from The actual checkpoint doesn't use group query attention. to The current checkpoint doesn't use group query attention. about 1 month ago

Atnafu

Lelapa AI org about 1 month ago

Hello, when loading the model, add trust_remote_code=True

e.g


llm = AutoModelForCausalLM.from_pretrained("lelapa/InkubaLM-0.4B", torch_dtype=torch.float16, trust_remote_code=True)

Atnafu changed pull request status to merged about 1 month ago