Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 12 days ago • 131 • 11
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 12 days ago • 131 • 11
LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper • 2408.10188 • Published Aug 19 • 51 • 3
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision Paper • 2407.06189 • Published Jul 8 • 25 • 3