ViT size mismatch of mlp (ffn) tensors
#2
by
cmp-nct
- opened
First of all: gratulations on the llava-1.6 launch. You've just showcased how simple a solution can be to be on eye level with much more complex architectures.
I struggle with your ViT:
size mismatch for vision_model.encoder.layers.23.mlp.fc1.bias: copying a param with shape torch.Size([4096]) from checkpoint, the shape in current model is torch.Size([13824]).
size mismatch for vision_model.encoder.layers.23.mlp.fc2.weight: copying a param with shape torch.Size([1024, 4096]) from checkpoint, the shape in current model is torch.Size([1024, 13824]).
That's from your embedded ViT model, it appears to have a larger shape than normal.
That also differs from the foundation (336 patch)
cmp-nct
changed discussion status to
closed