Animate-X: Universal Character Image Animation with Enhanced Motion Representation Paper • 2410.10306 • Published Oct 14 • 54
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning Paper • 2411.05003 • Published Nov 7 • 70
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation Paper • 2411.04709 • Published Nov 5 • 25
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation Paper • 2410.07171 • Published Oct 9 • 41
Story-Adapter: A Training-free Iterative Framework for Long Story Visualization Paper • 2410.06244 • Published Oct 8 • 19
How Far is Video Generation from World Model: A Physical Law Perspective Paper • 2411.02385 • Published Nov 4 • 33
Training-free Regional Prompting for Diffusion Transformers Paper • 2411.02395 • Published Nov 4 • 25
AutoVFX: Physically Realistic Video Editing from Natural Language Instructions Paper • 2411.02394 • Published Nov 4 • 17
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published Oct 28 • 77
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer Paper • 2410.10812 • Published Oct 14 • 15
DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control Paper • 2410.13830 • Published Oct 17 • 23
SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models Paper • 2411.05007 • Published Nov 7 • 16
Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models Paper • 2411.07232 • Published Nov 11 • 62
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision Paper • 2411.07199 • Published Nov 11 • 45
MagicQuill: An Intelligent Interactive Image Editing System Paper • 2411.09703 • Published Nov 14 • 57
AnimateAnything: Consistent and Controllable Animation for Video Generation Paper • 2411.10836 • Published Nov 16 • 23
Stylecodes: Encoding Stylistic Information For Image Generation Paper • 2411.12811 • Published Nov 19 • 11
VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement Paper • 2411.15115 • Published Nov 22 • 9
OminiControl: Minimal and Universal Control for Diffusion Transformer Paper • 2411.15098 • Published Nov 22 • 53
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows Paper • 2412.01169 • Published 24 days ago • 11
SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance Paper • 2412.02687 • Published 22 days ago • 109
NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training Paper • 2412.02030 • Published 23 days ago • 18
MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with Mixture of Score Guidance Paper • 2412.05355 • Published 19 days ago • 7
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis Paper • 2412.04431 • Published 20 days ago • 16
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation Paper • 2412.07589 • Published 15 days ago • 45
FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models Paper • 2412.07674 • Published 15 days ago • 20
UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics Paper • 2412.07774 • Published 15 days ago • 25
LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation Paper • 2412.05148 • Published 19 days ago • 11
ObjCtrl-2.5D: Training-free Object Control with Camera Poses Paper • 2412.07721 • Published 15 days ago • 8
StyleMaster: Stylize Your Video with Artistic Generation and Translation Paper • 2412.07744 • Published 15 days ago • 19
FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models Paper • 2412.08629 • Published 14 days ago • 11
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation Paper • 2412.09349 • Published 13 days ago • 7
LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations Paper • 2412.08580 • Published 14 days ago • 44
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM Paper • 2412.09618 • Published 13 days ago • 21
LoRACLR: Contrastive Adaptation for Customization of Diffusion Models Paper • 2412.09622 • Published 13 days ago • 7
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion Paper • 2412.09626 • Published 13 days ago • 19
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution Paper • 2412.15213 • Published 6 days ago • 25