nanoLLaVA-1.5 is here! Same size (1B), better performance 🔥🔥🔥 And it is much more powerful than v1.0 Try it out now on HF Spaces: qnguyen3/nanoLLaVA Model: qnguyen3/nanoLLaVA-1.5
🎉 Introducing nanoLLaVA, a powerful multimodal AI model that packs the capabilities of a 1B parameter vision language model into just 5GB of VRAM. 🚀 This makes it an ideal choice for edge devices, bringing cutting-edge visual understanding and generation to your devices like never before. 📱💻
Under the hood, nanoLLaVA is based on the powerful vilm/Quyen-SE-v0.1 (my Qwen1.5-0.5B finetune) and Google's impressive google/siglip-so400m-patch14-384. 🧠 The model is trained using a data-centric approach to ensure optimal performance. 📊
In the spirit of transparency and collaboration, all code and model weights are open-sourced under the Apache 2.0 license. 🤝
Full fine-tuning of Microsoft's Phi2 on a single 4090 is now supported in axolotl. Thanks to @abacaj and @vikhyatk for their help with gradient checkpointing and flash attention fixes.