TIGER-Lab
/

VLM2Vec-Qwen2VL-7B

Image-Text-to-Text

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

memray commited on 3 days ago

Commit

ca62cd9

·

verified ·

1 Parent(s): bb6d72b

Update README.md

Files changed (1) hide show

README.md +9 -0

README.md CHANGED Viewed

@@ -28,6 +28,15 @@ Our model is being trained on MMEB-train and evaluated on MMEB-eval with contras
 ## Performance
 This model outperforms the baselines and previous version of VLM2Vec by a large margin.
 ![image/png](https://github.com/TIGER-AI-Lab/VLM2Vec/blob/main/figures/vlm2vec_results.png?raw=true)

 ## Performance
 This model outperforms the baselines and previous version of VLM2Vec by a large margin.
+| Model                                 | Classification | VQA  | Retrieval | Grounding | IND  | OOD  | Overall |
+|---------------------------------------|---------------|------|-----------|-----------|------|------|---------|
+| Phi-3.5-V, Full-model fine-tuned (#crop=4) | 52.8  | 50.3 | 57.8  | 72.3  | 62.8 | 47.4 | 55.9  |
+| Phi-3.5-V, LoRA            | 54.8  | 54.9 | 62.3  | 79.5  | 66.5 | 52.0 | 60.1  |
+| LLaVA-1.6, LoRA            | 54.7  | 50.3 | 56.2  | 64.0  | 61.0 | 47.5 | 55.0  |
+| LLaVA-1.6, LoRA            | 61.2  | 49.9 | 67.4  | 86.1  | 67.5 | 57.1 | 62.9  |
+| Qwen2-VL-2B, LoRA          | 59.0  | 49.4 | 65.4  | 73.4  | 66.0 | 52.6 | 60.1  |
+| **Qwen2-VL-7B, LoRA (this model)**          | **62.6**  | **57.8** | **69.9**  | 81.7  | **72.2** | **57.8** | **65.8**  |
 ![image/png](https://github.com/TIGER-AI-Lab/VLM2Vec/blob/main/figures/vlm2vec_results.png?raw=true)