Model Summary
NLLB-CLIP is a model that combines a text encoder from the NLLB model and an image encoder from the standard CLIP. This allows us to extend the model capabilities to 201 languages of the Flores-200. NLLB-CLIP sets state-of-the-art on the Crossmodal-3600 dataset by performing very well on low-resource languages. You can find more details about the model in the paper.
Acknowledgements
I thank ML Collective for providing Google Cloud compute resources to train the OpenCLIP-compatible version of NLLB-CLIP.
- Downloads last month
- 5,266
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.