SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers Paper • 2407.09413 • Published Jul 12 • 9
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct Paper • 2409.05840 • Published Sep 9 • 45
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published Sep 19 • 47
LVD-2M: A Long-take Video Dataset with Temporally Dense Captions Paper • 2410.10816 • Published Oct 14 • 19
BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions Paper • 2411.07461 • Published Nov 12 • 21
EgoVid-5M: A Large-Scale Video-Action Dataset for Egocentric Video Generation Paper • 2411.08380 • Published Nov 13 • 25
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection Paper • 2411.14794 • Published Nov 22 • 12
VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format Paper • 2411.17991 • Published 29 days ago • 5
GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation Paper • 2411.18499 • Published 29 days ago • 18
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation Paper • 2412.00927 • Published 24 days ago • 26
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Paper • 2412.05237 • Published 19 days ago • 46
CompCap: Improving Multimodal Large Language Models with Composite Captions Paper • 2412.05243 • Published 19 days ago • 18
Maya: An Instruction Finetuned Multilingual Multimodal Model Paper • 2412.07112 • Published 16 days ago • 25
MIT-10M: A Large Scale Parallel Corpus of Multilingual Image Translation Paper • 2412.07147 • Published 16 days ago • 5
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption Paper • 2412.09283 • Published 14 days ago • 19
Friends-MMC: A Dataset for Multi-modal Multi-party Conversation Understanding Paper • 2412.17295 • Published 3 days ago • 8