-
125
Qwen2.5 VL 72B Instruct
💻Interact with Qwen2.5-VL-Chat model using text and files
-
Qwen2.5-VL Technical Report
Paper • 2502.13923 • Published • 157 -
Qwen/Qwen2.5-VL-72B-Instruct
Image-Text-to-Text • Updated • 270k • 360 -
Qwen/Qwen2.5-VL-72B-Instruct-AWQ
Image-Text-to-Text • Updated • 146k • 36
Collections
Discover the best community collections!
Collections including paper arxiv:2502.13923
-
Instruction Following without Instruction Tuning
Paper • 2409.14254 • Published • 29 -
Baichuan Alignment Technical Report
Paper • 2410.14940 • Published • 50 -
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution
Paper • 2410.16256 • Published • 60 -
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Paper • 2410.18558 • Published • 19
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 26 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 13 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 43 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 22
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 180 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 17 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 50 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 41
-
Large Language Diffusion Models
Paper • 2502.09992 • Published • 99 -
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
Paper • 2502.10391 • Published • 31 -
Diverse Inference and Verification for Advanced Reasoning
Paper • 2502.09955 • Published • 17 -
Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models
Paper • 2502.08130 • Published • 9
-
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features
Paper • 2502.14786 • Published • 128 -
LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models
Paper • 2502.14834 • Published • 24 -
Qwen2.5-VL Technical Report
Paper • 2502.13923 • Published • 157 -
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks
Paper • 2502.17157 • Published • 51
-
EVEv2: Improved Baselines for Encoder-Free Vision-Language Models
Paper • 2502.06788 • Published • 12 -
Scaling Pre-training to One Hundred Billion Data for Vision Language Models
Paper • 2502.07617 • Published • 29 -
VideoRoPE: What Makes for Good Video Rotary Position Embedding?
Paper • 2502.05173 • Published • 64 -
Qwen2.5-VL Technical Report
Paper • 2502.13923 • Published • 157