Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models Paper • 2501.19054 • Published Jan 31 • 10
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Paper • 2502.02492 • Published Feb 4 • 61
Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach Paper • 2502.03639 • Published Feb 5 • 9
MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm Paper • 2502.02358 • Published Feb 4 • 18
MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation Paper • 2502.04299 • Published Feb 6 • 18
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment Paper • 2502.04328 • Published Feb 6 • 29
ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features Paper • 2502.04320 • Published Feb 6 • 35
Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More Paper • 2502.03738 • Published Feb 6 • 11
On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices Paper • 2502.04363 • Published Feb 5 • 12
AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting Paper • 2502.05176 • Published about 1 month ago • 32
VideoRoPE: What Makes for Good Video Rotary Position Embedding? Paper • 2502.05173 • Published about 1 month ago • 64
Goku: Flow Based Video Generative Foundation Models Paper • 2502.04896 • Published about 1 month ago • 96
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates Paper • 2502.06772 • Published 27 days ago • 21
Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT Paper • 2502.06782 • Published 27 days ago • 13
Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile Paper • 2502.06155 • Published 28 days ago • 9
DreamDPO: Aligning Text-to-3D Generation with Human Preferences via Direct Preference Optimization Paper • 2502.04370 • Published Feb 5 • 7