Collections
Discover the best community collections!
Collections including paper arxiv:2502.01061
-
DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes
Paper • 2412.11100 • Published • 7 -
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
Paper • 2412.09856 • Published • 10 -
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation
Paper • 2412.09349 • Published • 8 -
MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation
Paper • 2412.04448 • Published • 9
-
One Shot, One Talk: Whole-body Talking Avatar from a Single Image
Paper • 2412.01106 • Published • 20 -
MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation
Paper • 2412.04448 • Published • 9 -
IDOL: Instant Photorealistic 3D Human Creation from a Single Image
Paper • 2412.14963 • Published • 6 -
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
Paper • 2502.01061 • Published • 167
-
Generative World Explorer
Paper • 2411.11844 • Published • 76 -
Chirpy3D: Continuous Part Latents for Creative 3D Bird Generation
Paper • 2501.04144 • Published • 18 -
SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images
Paper • 2501.04689 • Published • 17 -
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration
Paper • 2501.01320 • Published • 11
-
Differential Transformer
Paper • 2410.05258 • Published • 169 -
PaliGemma 2: A Family of Versatile VLMs for Transfer
Paper • 2412.03555 • Published • 126 -
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Paper • 2412.04467 • Published • 106 -
o1-Coder: an o1 Replication for Coding
Paper • 2412.00154 • Published • 43
-
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Paper • 2402.17177 • Published • 87 -
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper • 2402.17485 • Published • 191 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper • 2403.00522 • Published • 45 -
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Paper • 2403.04692 • Published • 39