Finetuning script for Phi3.5-Vision
#24
by
2U1
- opened
https://github.com/2U1/Phi3-Vision-Finetune
I made a fine-tuning script for Phi3.5-Vision. It supprots single-image, multi-image and video dataset.
You can select each module (Vision, LLM, Projector) for fine tuning and set different learning rate for all.
Feedback and issues are welcome!
can you share the fine-tuning script?? I want to train this model to recognize text, emojis as well as corresponding layout.
Can it be used for VQA tasks ?