InternViT-6B + QLLaMA, can be used for image-text retrieval like CLIP

#5
by vitvit - opened

Can you provide an example? (using text and image)

OpenGVLab org

Hi, please see the quick start section in the model card.

https://huggingface.co./OpenGVLab/InternVL-14B-224px#quick-start

It is not clear. It specifies how to load image encoder but not the fext encoder

I agree with vitvit. Is there a way to we get CLIP like embeddings out of the model that could be indexed to a vector database to be searched upon later?

Sign up or log in to comment