-
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 144 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 12 -
MambaByte: Token-free Selective State Space Model
Paper • 2401.13660 • Published • 51 -
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 45
Collections
Discover the best community collections!
Collections including paper arxiv:2405.14129
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 25 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 12 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 38 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 19
-
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper • 2401.13601 • Published • 45 -
Orion-14B: Open-source Multilingual Large Language Models
Paper • 2401.12246 • Published • 12 -
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model
Paper • 2405.09215 • Published • 18 -
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability
Paper • 2405.14129 • Published • 12
-
Vript: A Video Is Worth Thousands of Words
Paper • 2406.06040 • Published • 23 -
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Paper • 2406.04325 • Published • 72 -
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Paper • 2406.01574 • Published • 43 -
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Paper • 2405.21075 • Published • 19
-
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs
Paper • 2403.12596 • Published • 9 -
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Paper • 2404.13013 • Published • 30 -
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Paper • 2404.16994 • Published • 35 -
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability
Paper • 2405.14129 • Published • 12