EMO2: End-Effector Guided Audio-Driven Avatar Video Generation Paper • 2501.10687 • Published 8 days ago • 11 • 4
Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens Paper • 2501.07730 • Published 13 days ago • 16 • 3
Do generative video models learn physical principles from watching videos? Paper • 2501.09038 • Published 12 days ago • 30 • 3
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation Paper • 2501.09755 • Published 10 days ago • 33 • 4
VideoAuteur: Towards Long Narrative Video Generation Paper • 2501.06173 • Published 16 days ago • 31 • 3
VideoRAG: Retrieval-Augmented Generation over Video Corpus Paper • 2501.05874 • Published 16 days ago • 66 • 6
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper • 2501.06186 • Published 16 days ago • 59
Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model Paper • 2501.05122 • Published 17 days ago • 18 • 3
VidTwin: Video VAE with Decoupled Structure and Dynamics Paper • 2412.17726 • Published Dec 23, 2024 • 8 • 3
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution Paper • 2501.02976 • Published 20 days ago • 52 • 3
An Empirical Study of Autoregressive Pre-training from Videos Paper • 2501.05453 • Published 17 days ago • 37 • 7
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published 18 days ago • 248 • 41