image - a zzfive Collection

Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

Paper • 2401.13627 • Published Jan 24, 2024 • 74

UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion

Paper • 2401.13388 • Published Jan 24, 2024 • 11

DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing

Paper • 2402.02583 • Published Feb 4, 2024 • 8

SDXL-Lightning: Progressive Adversarial Diffusion Distillation

Paper • 2402.13929 • Published Feb 21, 2024 • 27

T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching

Paper • 2402.14167 • Published Feb 21, 2024 • 12

Subobject-level Image Tokenization

Paper • 2402.14327 • Published Feb 22, 2024 • 18

Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition

Paper • 2402.15504 • Published Feb 23, 2024 • 22

Multi-LoRA Composition for Image Generation

Paper • 2402.16843 • Published Feb 26, 2024 • 31

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Paper • 2402.17485 • Published Feb 27, 2024 • 191

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Paper • 2402.19481 • Published Feb 29, 2024 • 22

Trajectory Consistency Distillation

Paper • 2402.19159 • Published Feb 29, 2024 • 16

RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization

Paper • 2403.00483 • Published Mar 1, 2024 • 15

ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models

Paper • 2403.02084 • Published Mar 4, 2024 • 15

OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

Paper • 2403.01779 • Published Mar 4, 2024 • 30

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Paper • 2403.03206 • Published Mar 5, 2024 • 63

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

Paper • 2403.05135 • Published Mar 8, 2024 • 42

Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM

Paper • 2403.07487 • Published Mar 12, 2024 • 15

Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering

Paper • 2403.09622 • Published Mar 14, 2024 • 17

StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control

Paper • 2403.09055 • Published Mar 14, 2024 • 25

IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models

Paper • 2403.13535 • Published Mar 20, 2024 • 22

DepthFM: Fast Monocular Depth Estimation with Flow Matching

Paper • 2403.13788 • Published Mar 20, 2024 • 17

Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos

Paper • 2403.13044 • Published Mar 19, 2024 • 15

FlashFace: Human Image Personalization with High-fidelity Identity Preservation

Paper • 2403.17008 • Published Mar 25, 2024 • 20

SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions

Paper • 2403.16627 • Published Mar 25, 2024 • 20

ViTAR: Vision Transformer with Any Resolution

Paper • 2403.18361 • Published Mar 27, 2024 • 54

ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion

Paper • 2403.18818 • Published Mar 27, 2024 • 26

CosmicMan: A Text-to-Image Foundation Model for Humans

Paper • 2404.01294 • Published Apr 1, 2024 • 16

Condition-Aware Neural Network for Controlled Image Generation

Paper • 2404.01143 • Published Apr 1, 2024 • 13

Measuring Style Similarity in Diffusion Models

Paper • 2404.01292 • Published Apr 1, 2024 • 17

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Paper • 2404.03653 • Published Apr 4, 2024 • 36

RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

Paper • 2404.03673 • Published Mar 25, 2024 • 16

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Paper • 2404.07987 • Published Apr 11, 2024 • 47

Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies

Paper • 2404.08197 • Published Apr 12, 2024 • 29

Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model

Paper • 2404.09967 • Published Apr 15, 2024 • 21

HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

Paper • 2404.09990 • Published Apr 15, 2024 • 13

Dynamic Typography: Bringing Words to Life

Paper • 2404.11614 • Published Apr 17, 2024 • 45

MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation

Paper • 2404.11565 • Published Apr 17, 2024 • 15

Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis

Paper • 2404.13686 • Published Apr 21, 2024 • 28

Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

Paper • 2404.14507 • Published Apr 22, 2024 • 23

PuLID: Pure and Lightning ID Customization via Contrastive Alignment

Paper • 2404.16022 • Published Apr 24, 2024 • 24

Editable Image Elements for Controllable Synthesis

Paper • 2404.16029 • Published Apr 24, 2024 • 11

ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning

Paper • 2404.15449 • Published Apr 23, 2024 • 13

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

Paper • 2404.16771 • Published Apr 25, 2024 • 18

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

Paper • 2405.01434 • Published May 2, 2024 • 55

Customizing Text-to-Image Models with a Single Image Pair

Paper • 2405.01536 • Published May 2, 2024 • 22

Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control

Paper • 2405.12970 • Published May 21, 2024 • 25

RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance

Paper • 2405.14677 • Published May 23, 2024 • 12

DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis

Paper • 2405.14224 • Published May 23, 2024 • 16

Semantica: An Adaptable Image-Conditioned Diffusion Model

Paper • 2405.14857 • Published May 23, 2024 • 11

EM Distillation for One-step Diffusion Models

Paper • 2405.16852 • Published May 27, 2024 • 12

Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models

Paper • 2405.16759 • Published May 27, 2024 • 8

Phased Consistency Model

Paper • 2405.18407 • Published May 28, 2024 • 48

BitsFusion: 1.99 bits Weight Quantization of Diffusion Model

Paper • 2406.04333 • Published Jun 6, 2024 • 38

pOps: Photo-Inspired Diffusion Operators

Paper • 2406.01300 • Published Jun 3, 2024 • 18

Zero-shot Image Editing with Reference Imitation

Paper • 2406.07547 • Published Jun 11, 2024 • 33

An Image is Worth 32 Tokens for Reconstruction and Generation

Paper • 2406.07550 • Published Jun 11, 2024 • 58

AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising

Paper • 2406.06911 • Published Jun 11, 2024 • 12

FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation

Paper • 2406.08392 • Published Jun 12, 2024 • 21

Depth Anything V2

Paper • 2406.09414 • Published Jun 13, 2024 • 97

An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13, 2024 • 51

Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

Paper • 2406.09416 • Published Jun 13, 2024 • 28

EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

Paper • 2406.09162 • Published Jun 13, 2024 • 14

Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering

Paper • 2406.10208 • Published Jun 14, 2024 • 22

Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

Paper • 2406.11831 • Published Jun 17, 2024 • 22

The Devil is in the Details: StyleFeatureEditor for Detail-Rich StyleGAN Inversion and High Quality Image Editing

Paper • 2406.10601 • Published Jun 15, 2024 • 66

Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps

Paper • 2406.14539 • Published Jun 20, 2024 • 27

DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

Paper • 2406.16855 • Published Jun 24, 2024 • 55

Aligning Diffusion Models with Noise-Conditioned Perception

Paper • 2406.17636 • Published Jun 25, 2024 • 27

Magic Insert: Style-Aware Drag-and-Drop

Paper • 2407.02489 • Published Jul 2, 2024 • 22

DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents

Paper • 2407.03300 • Published Jul 3, 2024 • 13

PartCraft: Crafting Creative Objects by Parts

Paper • 2407.04604 • Published Jul 5, 2024 • 5

SVGCraft: Beyond Single Object Text-to-SVG Synthesis with Comprehensive Canvas Layout

Paper • 2404.00412 • Published Mar 30, 2024 • 2

DataDream: Few-shot Guided Dataset Generation

Paper • 2407.10910 • Published Jul 15, 2024 • 10

Scaling Diffusion Transformers to 16 Billion Parameters

Paper • 2407.11633 • Published Jul 16, 2024 • 26

IMAGDressing-v1: Customizable Virtual Dressing

Paper • 2407.12705 • Published Jul 17, 2024 • 13

CGB-DM: Content and Graphic Balance Layout Generation with Transformer-based Diffusion Model

Paper • 2407.15233 • Published Jul 21, 2024 • 6

Artist: Aesthetically Controllable Text-Driven Stylization without Training

Paper • 2407.15842 • Published Jul 22, 2024 • 14

Discrete Flow Matching

Paper • 2407.15595 • Published Jul 22, 2024 • 13

ViPer: Visual Personalization of Generative Models via Individual Preference Learning

Paper • 2407.17365 • Published Jul 24, 2024 • 12

Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

Paper • 2407.16982 • Published Jul 24, 2024 • 41

BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation

Paper • 2407.17952 • Published Jul 25, 2024 • 31

SHIC: Shape-Image Correspondences with no Keypoint Supervision

Paper • 2407.18907 • Published Jul 26, 2024 • 41

TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models

Paper • 2408.00735 • Published Aug 1, 2024 • 17

Smoothed Energy Guidance: Guiding Diffusion Models with Reduced Energy Curvature of Attention

Paper • 2408.00760 • Published Aug 1, 2024 • 7

Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining

Paper • 2408.02657 • Published Aug 5, 2024 • 34

ProCreate, Dont Reproduce! Propulsive Energy Diffusion for Creative Generation

Paper • 2408.02226 • Published Aug 5, 2024 • 12

IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts

Paper • 2408.03209 • Published Aug 6, 2024 • 22

Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling

Paper • 2408.03695 • Published Aug 7, 2024 • 13

ControlNeXt: Powerful and Efficient Control for Image and Video Generation

Paper • 2408.06070 • Published Aug 12, 2024 • 53

BRAT: Bonus oRthogonAl Token for Architecture Agnostic Textual Inversion

Paper • 2408.04785 • Published Aug 8, 2024 • 9

UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization

Paper • 2408.05939 • Published Aug 12, 2024 • 15

Imagen 3

Paper • 2408.07009 • Published Aug 13, 2024 • 61

ZePo: Zero-Shot Portrait Stylization with Faster Sampling

Paper • 2408.05492 • Published Aug 10, 2024 • 7

Generative Photomontage

Paper • 2408.07116 • Published Aug 13, 2024 • 20

JPEG-LM: LLMs as Image Generators with Canonical Codec Representations

Paper • 2408.08459 • Published Aug 15, 2024 • 45

TurboEdit: Instant text-based image editing

Paper • 2408.08332 • Published Aug 14, 2024 • 20

Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering

Paper • 2408.09702 • Published Aug 19, 2024 • 11

TraDiffusion: Trajectory-Based Training-Free Image Generation

Paper • 2408.09739 • Published Aug 19, 2024 • 9

MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning

Paper • 2408.11001 • Published Aug 20, 2024 • 12

The Brittleness of AI-Generated Image Watermarking Techniques: Examining Their Robustness Against Visual Paraphrasing Attacks

Paper • 2408.10446 • Published Aug 19, 2024 • 9

Scalable Autoregressive Image Generation with Mamba

Paper • 2408.12245 • Published Aug 22, 2024 • 26

CODE: Confident Ordinary Differential Editing

Paper • 2408.12418 • Published Aug 22, 2024 • 4

SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher

Paper • 2408.14176 • Published Aug 26, 2024 • 62

Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation

Paper • 2408.14819 • Published Aug 27, 2024 • 21

Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation

Paper • 2408.15991 • Published Aug 28, 2024 • 16

CSGO: Content-Style Composition in Text-to-Image Generation

Paper • 2408.16766 • Published Aug 29, 2024 • 18

CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization

Paper • 2408.15914 • Published Aug 28, 2024 • 23

VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers

Paper • 2408.17131 • Published Aug 30, 2024 • 11

LinFusion: 1 GPU, 1 Minute, 16K Image

Paper • 2409.02097 • Published Sep 3, 2024 • 33

Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization

Paper • 2409.00492 • Published Aug 31, 2024 • 11

Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing

Paper • 2409.01322 • Published Sep 2, 2024 • 95

IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation

Paper • 2409.08240 • Published Sep 12, 2024 • 22

InstantDrag: Improving Interactivity in Drag-based Image Editing

Paper • 2409.08857 • Published Sep 13, 2024 • 33

StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation

Paper • 2409.12576 • Published Sep 19, 2024 • 16

Imagine yourself: Tuning-Free Personalized Image Generation

Paper • 2409.13346 • Published Sep 20, 2024 • 69

Colorful Diffuse Intrinsic Image Decomposition in the Wild

Paper • 2409.13690 • Published Sep 20, 2024 • 14

Improvements to SDXL in NovelAI Diffusion V3

Paper • 2409.15997 • Published Sep 24, 2024 • 13

Pixel-Space Post-Training of Latent Diffusion Models

Paper • 2409.17565 • Published Sep 26, 2024 • 21

OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction

Paper • 2410.04932 • Published Oct 7, 2024 • 9

Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding

Paper • 2410.01699 • Published Oct 2, 2024 • 18

IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

Paper • 2410.07171 • Published Oct 9, 2024 • 42

Story-Adapter: A Training-free Iterative Framework for Long Story Visualization

Paper • 2410.06244 • Published Oct 8, 2024 • 19

Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models

Paper • 2410.02416 • Published Oct 3, 2024 • 27

DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models

Paper • 2410.08207 • Published Oct 10, 2024 • 19

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

Paper • 2410.08261 • Published Oct 10, 2024 • 50

EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models

Paper • 2410.07133 • Published Oct 9, 2024 • 19

Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations

Paper • 2410.10792 • Published Oct 14, 2024 • 30

Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices

Paper • 2410.11795 • Published Oct 15, 2024 • 18

Improving Long-Text Alignment for Text-to-Image Diffusion Models

Paper • 2410.11817 • Published Oct 15, 2024 • 15

Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens

Paper • 2410.13863 • Published Oct 17, 2024 • 37

VidPanos: Generative Panoramic Videos from Casual Panning Videos

Paper • 2410.13832 • Published Oct 17, 2024 • 13

FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model

Paper • 2410.13925 • Published Oct 17, 2024 • 24

BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities

Paper • 2410.14672 • Published Oct 18, 2024 • 8

Scalable Ranked Preference Optimization for Text-to-Image Generation

Paper • 2410.18013 • Published Oct 23, 2024 • 15

Stable Consistency Tuning: Understanding and Improving Consistency Models

Paper • 2410.18958 • Published Oct 24, 2024 • 10

DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation

Paper • 2410.18666 • Published Oct 24, 2024 • 19

Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders

Paper • 2410.22366 • Published Oct 28, 2024 • 78

Constant Acceleration Flow

Paper • 2411.00322 • Published Nov 1, 2024 • 24

In-Context LoRA for Diffusion Transformers

Paper • 2410.23775 • Published Oct 31, 2024 • 11

Training-free Regional Prompting for Diffusion Transformers

Paper • 2411.02395 • Published Nov 4, 2024 • 25

Constrained Diffusion Implicit Models

Paper • 2411.00359 • Published Nov 1, 2024 • 6

SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Paper • 2411.05007 • Published Nov 7, 2024 • 18

Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models

Paper • 2411.07232 • Published Nov 11, 2024 • 65

OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision

Paper • 2411.07199 • Published Nov 11, 2024 • 47

Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models

Paper • 2411.07126 • Published Nov 11, 2024 • 29

Watermark Anything with Localized Messages

Paper • 2411.07231 • Published Nov 11, 2024 • 20

JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation

Paper • 2411.07975 • Published Nov 12, 2024 • 30

Scaling Properties of Diffusion Models for Perceptual Tasks

Paper • 2411.08034 • Published Nov 12, 2024 • 13

MagicQuill: An Intelligent Interactive Image Editing System

Paper • 2411.09703 • Published Nov 14, 2024 • 68

Inconsistencies In Consistency Models: Better ODE Solving Does Not Imply Better Samples

Paper • 2411.08954 • Published Nov 13, 2024 • 9

Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement

Paper • 2411.06558 • Published Nov 10, 2024 • 34

FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on

Paper • 2411.10499 • Published Nov 15, 2024 • 13

Continuous Speculative Decoding for Autoregressive Image Generation

Paper • 2411.11925 • Published Nov 18, 2024 • 16

Stylecodes: Encoding Stylistic Information For Image Generation

Paper • 2411.12811 • Published Nov 19, 2024 • 12

Generating Compositional Scenes via Text-to-image RGBA Instance Generation

Paper • 2411.10913 • Published Nov 16, 2024 • 3

Stable Flow: Vital Layers for Training-Free Image Editing

Paper • 2411.14430 • Published Nov 21, 2024 • 22

Style-Friendly SNR Sampler for Style-Driven Generation

Paper • 2411.14793 • Published Nov 22, 2024 • 36

OminiControl: Minimal and Universal Control for Diffusion Transformer

Paper • 2411.15098 • Published Nov 22, 2024 • 55

MyTimeMachine: Personalized Facial Age Transformation

Paper • 2411.14521 • Published Nov 21, 2024 • 20

Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator

Paper • 2411.15466 • Published Nov 23, 2024 • 35

One Diffusion to Generate Them All

Paper • 2411.16318 • Published Nov 25, 2024 • 28

Controllable Human Image Generation with Personalized Multi-Garments

Paper • 2411.16801 • Published Nov 25, 2024 • 4

ROICtrl: Boosting Instance Control for Visual Generation

Paper • 2411.17949 • Published Nov 27, 2024 • 82

DreamCache: Finetuning-Free Lightweight Personalized Image Generation via Feature Caching

Paper • 2411.17786 • Published Nov 26, 2024 • 12

Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient

Paper • 2411.17787 • Published Nov 26, 2024 • 12

Diffusion Self-Distillation for Zero-Shot Customized Image Generation

Paper • 2411.18616 • Published Nov 27, 2024 • 15

Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis

Paper • 2411.17769 • Published Nov 26, 2024 • 7

Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing

Paper • 2411.16832 • Published Nov 25, 2024 • 2

TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models

Paper • 2411.18350 • Published Nov 27, 2024 • 27

FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion

Paper • 2411.18552 • Published Nov 27, 2024 • 18

Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis

Paper • 2412.01819 • Published Dec 2, 2024 • 35

Art-Free Generative Models: Art Creation Without Graphic Art Knowledge

Paper • 2412.00176 • Published Nov 29, 2024 • 8

SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance

Paper • 2412.02687 • Published Dec 3, 2024 • 109

TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation

Paper • 2412.03069 • Published Dec 4, 2024 • 32

LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting

Paper • 2412.00177 • Published Nov 29, 2024 • 7

A Noise is Worth Diffusion Guidance

Paper • 2412.03895 • Published Dec 5, 2024 • 29

Negative Token Merging: Image-based Adversarial Feature Guidance

Paper • 2412.01339 • Published Dec 2, 2024 • 23

AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models

Paper • 2412.04146 • Published Dec 5, 2024 • 23

Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

Paper • 2412.04431 • Published Dec 5, 2024 • 18

ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality

Paper • 2412.04062 • Published Dec 5, 2024 • 9

SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion

Paper • 2412.04301 • Published Dec 5, 2024 • 36

PanoDreamer: 3D Panorama Synthesis from a Single Image

Paper • 2412.04827 • Published Dec 6, 2024 • 11

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

Paper • 2412.07589 • Published Dec 10, 2024 • 46

Hidden in the Noise: Two-Stage Robust Watermarking for Images

Paper • 2412.04653 • Published Dec 5, 2024 • 28

FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models

Paper • 2412.07674 • Published Dec 10, 2024 • 20

UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics

Paper • 2412.07774 • Published Dec 10, 2024 • 28

LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation

Paper • 2412.05148 • Published Dec 6, 2024 • 11

Learning Flow Fields in Attention for Controllable Person Image Generation

Paper • 2412.08486 • Published Dec 11, 2024 • 34

FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models

Paper • 2412.08629 • Published Dec 11, 2024 • 12

StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements

Paper • 2412.08503 • Published Dec 11, 2024 • 8

EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM

Paper • 2412.09618 • Published Dec 12, 2024 • 21

SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training

Paper • 2412.09619 • Published Dec 12, 2024 • 25

LoRACLR: Contrastive Adaptation for Customization of Diffusion Models

Paper • 2412.09622 • Published Dec 12, 2024 • 8

FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion

Paper • 2412.09626 • Published Dec 12, 2024 • 20

ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation

Paper • 2412.08645 • Published Dec 11, 2024 • 11

FireFlow: Fast Inversion of Rectified Flow for Image Semantic Editing

Paper • 2412.07517 • Published Dec 10, 2024 • 11

FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers

Paper • 2412.09611 • Published Dec 12, 2024 • 10

BrushEdit: All-In-One Image Inpainting and Editing

Paper • 2412.10316 • Published Dec 13, 2024 • 33

ColorFlow: Retrieval-Augmented Image Sequence Colorization

Paper • 2412.11815 • Published Dec 16, 2024 • 26

Causal Diffusion Transformers for Generative Modeling

Paper • 2412.12095 • Published Dec 16, 2024 • 23

FashionComposer: Compositional Fashion Image Generation

Paper • 2412.14168 • Published Dec 18, 2024 • 16

ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers

Paper • 2412.12571 • Published Dec 17, 2024 • 8

Flowing from Words to Pixels: A Framework for Cross-Modality Evolution

Paper • 2412.15213 • Published Dec 19, 2024 • 26

Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion

Paper • 2412.14462 • Published Dec 19, 2024 • 15

1.58-bit FLUX

Paper • 2412.18653 • Published Dec 24, 2024 • 80

The Superposition of Diffusion Models Using the Itô Density Estimator

Paper • 2412.17762 • Published Dec 23, 2024 • 12

From Elements to Design: A Layered Approach for Automatic Graphic Design Composition

Paper • 2412.19712 • Published Dec 27, 2024 • 15

VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control

Paper • 2412.20800 • Published Dec 30, 2024 • 10

DepthMaster: Taming Diffusion Models for Monocular Depth Estimation

Paper • 2501.02576 • Published Jan 5 • 15

MagicFace: High-Fidelity Facial Expression Editing with Action-Unit Control

Paper • 2501.02260 • Published Jan 4 • 5

The GAN is dead; long live the GAN! A Modern GAN Baseline

Paper • 2501.05441 • Published Jan 9 • 88

MangaNinja: Line Art Colorization with Precise Reference Following

Paper • 2501.08332 • Published Jan 14 • 57

Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models

Paper • 2501.06751 • Published Jan 12 • 31

Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens

Paper • 2501.07730 • Published Jan 13 • 16

FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors

Paper • 2501.08225 • Published Jan 14 • 18

3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering

Paper • 2501.05131 • Published Jan 9 • 34

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Paper • 2501.09732 • Published Jan 16 • 70

SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces

Paper • 2501.09756 • Published Jan 16 • 19

Textoon: Generating Vivid 2D Cartoon Characters from Text Descriptions

Paper • 2501.10020 • Published Jan 17 • 22

TokenVerse: Versatile Multi-concept Personalization in Token Modulation Space

Paper • 2501.12224 • Published Jan 21 • 46

GPS as a Control Signal for Image Generation

Paper • 2501.12390 • Published Jan 21 • 12

Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step

Paper • 2501.13926 • Published Jan 23 • 37

One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt

Paper • 2501.13554 • Published Jan 23 • 9

AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation

Paper • 2403.14614 • Published Mar 21, 2024 • 4

Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration

Paper • 2406.18516 • Published Jun 26, 2024 • 3

Visual Generation Without Guidance

Paper • 2501.15420 • Published Jan 26 • 8

SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer

Paper • 2501.18427 • Published Jan 30 • 17

Inverse Bridge Matching Distillation

Paper • 2502.01362 • Published Feb 3 • 27

LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer

Paper • 2502.01105 • Published Feb 3 • 20

Weak-to-Strong Diffusion with Reflection

Paper • 2502.00473 • Published Feb 1 • 22

Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More

Paper • 2502.03738 • Published about 1 month ago • 10

Dual Caption Preference Optimization for Diffusion Models

Paper • 2502.06023 • Published 27 days ago • 9

Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation

Paper • 2502.08690 • Published 24 days ago • 41

ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation

Paper • 2502.09411 • Published 23 days ago • 18

Precise Parameter Localization for Textual Generation in Diffusion Models

Paper • 2502.09935 • Published 23 days ago • 11

Region-Adaptive Sampling for Diffusion Transformers

Paper • 2502.10389 • Published 22 days ago • 52

I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models

Paper • 2502.10458 • Published 25 days ago • 30

Diffusion Models without Classifier-free Guidance

Paper • 2502.12154 • Published 19 days ago • 4

PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data

Paper • 2502.14397 • Published 17 days ago • 38

One-step Diffusion Models with f-Divergence Distribution Matching

Paper • 2502.15681 • Published 15 days ago • 6

DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks

Paper • 2502.17157 • Published 13 days ago • 51

GCC: Generative Color Constancy via Diffusing a Color Checker

Paper • 2502.17435 • Published 12 days ago • 27

ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

Paper • 2502.18364 • Published 11 days ago • 32

KV-Edit: Training-Free Image Editing for Precise Background Preservation

Paper • 2502.17363 • Published 12 days ago • 32

K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs

Paper • 2502.18461 • Published 11 days ago • 15

LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation

Paper • 2502.18302 • Published 11 days ago • 4

GHOST 2.0: generative high-fidelity one shot transfer of heads

Paper • 2502.18417 • Published 11 days ago • 62

Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator

Paper • 2502.19204 • Published 10 days ago • 11

UniTok: A Unified Tokenizer for Visual Generation and Understanding

Paper • 2502.20321 • Published 9 days ago • 27

Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think

Paper • 2502.20172 • Published 9 days ago • 26

FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute

Paper • 2502.20126 • Published 10 days ago • 19

Training Consistency Models with Variational Noise Coupling

Paper • 2502.18197 • Published 12 days ago • 5

How far can we go with ImageNet for Text-to-Image generation?

Paper • 2502.21318 • Published 8 days ago • 25

RectifiedHR: Enable Efficient High-Resolution Image Generation via Energy Rectification

Paper • 2503.02537 • Published 5 days ago • 9