ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video(ECCV2024)

This repo is the official model checkpoints of "ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video"(ECCV2024)

Models

We provide the checkpoints before reparameterization, you could reparameter the weight refer to tools\weight_reparam.py in our codes.

Kinetics 400

Backbone Pretrain GFLOPs Param New Param (M) acc@1 Views
ViT-B/16 CLIP 422 86 0 83.0 8x1x3
ViT-L/14 CLIP 1946 304 0 86.3 8x1x3
ViT-L/14 CLIP 7783 304 0 87.2 32x1x3

Something Something V2

Backbone Pretrain GFLOPs Param New Param (M) acc@1 Views
ViT-L/14 CLIP 7783 304 0 72.2 32x3x1

If you find our work useful in your research, please cite:

@article{li2023zeroi2v,
  title={ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video},
  author={Li, Xinhao and Zhu, Yuhan and Wang, Limin},
  journal={arXiv preprint arXiv:2310.01324},
  year={2023}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Unable to determine this model's library. Check the docs .