@xiaotianhan on Hugging Face: "🎉 🎉 🎉 Happy to share our recent work. We noticed that image resolution…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

xiaotianhan

posted an update Mar 28, 2024

Post

2100

🎉 🎉 🎉 Happy to share our recent work. We noticed that image resolution plays an important role, either in improving multi-modal large language models (MLLM) performance or in Sora style any resolution encoder decoder, we hope this work can help lift restriction of 224x224 resolution limit in ViT.

ViTAR: Vision Transformer with Any Resolution (2403.18361)

merve

Apr 19, 2024

Hiya, are you planning to open-source the models?

xiaotianhan

Apr 23, 2024

Thanks for your interest, yeah, we will open source our code and pretrained weights soon.

In this post

xiaotianhan Xiaotian Han
merve Merve Noyan