Shards
this model needs to be sharded to 1b segments as well as the problem of the vocabulary not being aligned to 32000 this is a problem as other models are aligned to this number, indicating the tokenizer for the model is also incorrect , hence when talking to the model its not very smart .. but it did identify pictures. so the tokenizer mistake means the model is still useless for merging etc!
Please Reshard this model , as well as update the architecture to ?? something transformers recognizes in every hf.transformers ..
@LeroyDyer llava 1.6 should be far better then this. Check out llava 1.6 mistral hf since that works with huggingface and is much better in language and answering vqa and captioning images as well.
Also, I would highly not recommend merging llava models as that will most likely reduce vqa and captioning performance by a lot.