SmolVLM 256M & 500M Collection Collection for models & demos for even smoller SmolVLM release • 12 items • Updated 4 days ago • 54
EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents Paper • 2501.11858 • Published 7 days ago • 3
Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos Paper • 2501.13826 • Published 4 days ago • 19
Temporal Preference Optimization for Long-Form Video Understanding Paper • 2501.13919 • Published 4 days ago • 18
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding Paper • 2501.13106 • Published 5 days ago • 69
MSTS: A Multimodal Safety Test Suite for Vision-Language Models Paper • 2501.10057 • Published 10 days ago • 8
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model Paper • 2501.12368 • Published 6 days ago • 37
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper • 2501.12380 • Published 6 days ago • 76
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation Paper • 2501.09755 • Published 11 days ago • 33