Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published 3 days ago • 61 • 2
The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation Paper • 2503.04606 • Published 3 days ago • 7 • 1
Dedicated Feedback and Edit Models Empower Inference-Time Scaling for Open-Ended General-Domain Tasks Paper • 2503.04378 • Published 3 days ago • 6 • 3
GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control Paper • 2503.03751 • Published 4 days ago • 18 • 4
Running on Zero 96 96 Diffusion Self Distillation 🦀 Generate detailed images from an input image and text prompt
Light-R1 Collection Surpassing R1-Distill from Scratch* with 70k Math Data through Curriculum SFT & DPO • 3 items • Updated 5 days ago • 8
view article Article A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality 6 days ago • 57
C4AI Aya Vision Collection Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. • 5 items • Updated 5 days ago • 59