Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis Paper • 2412.15322 • Published 6 days ago • 16
Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion Paper • 2412.09593 • Published 13 days ago • 17
You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale Paper • 2412.06699 • Published 16 days ago • 11
SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion Paper • 2412.04301 • Published 20 days ago • 32
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion Paper • 2412.04424 • Published 20 days ago • 55
NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training Paper • 2412.02030 • Published 23 days ago • 18
One Shot, One Talk: Whole-body Talking Avatar from a Single Image Paper • 2412.01106 • Published 24 days ago • 18
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters Paper • 2412.00174 • Published 26 days ago • 22
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait Paper • 2412.01064 • Published 24 days ago • 25
view article Article Bridging the Gap Between Physical Numerical Simulations and Machine Learning: Introducing The Well By rubenohana • 24 days ago • 17
Trajectory Attention for Fine-grained Video Motion Control Paper • 2411.19324 • Published 27 days ago • 12
Inconsistencies In Consistency Models: Better ODE Solving Does Not Imply Better Samples Paper • 2411.08954 • Published Nov 13 • 8
Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters Paper • 2411.18197 • Published 29 days ago • 14
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models Paper • 2411.18613 • Published 28 days ago • 50
DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation Paper • 2411.16657 • Published about 1 month ago • 17
OminiControl: Minimal and Universal Control for Diffusion Transformer Paper • 2411.15098 • Published Nov 22 • 53
Multimodal Autoregressive Pre-training of Large Vision Encoders Paper • 2411.14402 • Published Nov 21 • 43