Finetuning script for Phi3.5-Vision

#24
by 2U1 - opened

https://github.com/2U1/Phi3-Vision-Finetune

I made a fine-tuning script for Phi3.5-Vision. It supprots single-image, multi-image and video dataset.
You can select each module (Vision, LLM, Projector) for fine tuning and set different learning rate for all.

Feedback and issues are welcome!

can you share the fine-tuning script?? I want to train this model to recognize text, emojis as well as corresponding layout.

@adnanPBI You can visit the repo and use it !

Can it be used for VQA tasks ?

@dutta18 Yes it could be use for VQA tasks.

Sign up or log in to comment