Model card for CLIP ViT-T-16 distilled with CC3M and CC12M from CLIP ViT-B-16 Laion400m Teacher

From weight: ViT-B-16_cc3m_12m_kd_ViT-T-16_cc3m_12m_ep32.pt

Model Details

Model Description

A CLIP ViT-T-16 distilled with CC3M and CC12M from CLIP ViT-B-16 Laion400m teacher.

Reference

Please refer to the original work.

@inproceedings{yang2024clip,
  title={CLIP-KD: An Empirical Study of CLIP Model Distillation},
  author={Yang, Chuanguang and An, Zhulin and Huang, Libo and Bi, Junyu and Yu, Xinqiang and Yang, Han and Diao, Boyu and Xu, Yongjun},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2024}
}
Downloads last month
23
Safetensors
Model size
46.1M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Datasets used to train romrawinjp/clip-kd_ViT-T-16-Laion400M_KD-CC3M12M