InternViT-6B + QLLaMA, can be used for image-text retrieval like CLIP

by vitvit - opened 23 days ago

Discussion

vitvit

23 days ago

Can you provide an example? (using text and image)

czczup

OpenGVLab org 23 days ago

Hi, please see the quick start section in the model card.

https://huggingface.co./OpenGVLab/InternVL-14B-224px#quick-start

vitvit

23 days ago

It is not clear. It specifies how to load image encoder but not the fext encoder

dch239

5 days ago

I agree with vitvit. Is there a way to we get CLIP like embeddings out of the model that could be indexed to a vector database to be searched upon later?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment