Image Generation - a kaizuberbuehler Collection

kaizuberbuehler 's Collections

Image Generation

Vision Language Models

Foundation Models

Synthetic Data and Self-Improvement

Agents

Video Generation

LM Prompt Engineering

LM Capabilities and Scaling

Music Generation

LM Architectures

Code Generation

Speech Synthesis

EXL2 Quantized Models

Image Generation

updated Oct 2

EdgeFusion: On-Device Text-to-Image Generation

Paper • 2404.11925 • Published Apr 18 • 21
Dynamic Typography: Bringing Words to Life

Paper • 2404.11614 • Published Apr 17 • 44
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback

Paper • 2404.07987 • Published Apr 11 • 47
Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models

Paper • 2404.07724 • Published Apr 11 • 13
ByteEdit: Boost, Comply and Accelerate Generative Image Editing

Paper • 2404.04860 • Published Apr 7 • 24
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing

Paper • 2404.05717 • Published Apr 8 • 24
YaART: Yet Another ART Rendering Technology

Paper • 2404.05666 • Published Apr 8 • 15
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models

Paper • 2404.04478 • Published Apr 6 • 12
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Paper • 2404.03653 • Published Apr 4 • 33
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens

Paper • 2404.03413 • Published Apr 4 • 25
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction

Paper • 2404.02905 • Published Apr 3 • 65
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation

Paper • 2404.02733 • Published Apr 3 • 20
On the Scalability of Diffusion-based Text-to-Image Generation

Paper • 2404.02883 • Published Apr 3 • 17
Getting it Right: Improving Spatial Consistency in Text-to-Image Models

Paper • 2404.01197 • Published Apr 1 • 30
CosmicMan: A Text-to-Image Foundation Model for Humans

Paper • 2404.01294 • Published Apr 1 • 15
PuLID: Pure and Lightning ID Customization via Contrastive Alignment

Paper • 2404.16022 • Published Apr 24 • 21
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning

Paper • 2404.15449 • Published Apr 23 • 11
ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

Paper • 2404.16771 • Published Apr 25 • 16
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings

Paper • 2404.16820 • Published Apr 25 • 15
MaPa: Text-driven Photorealistic Material Painting for 3D Shapes

Paper • 2404.17569 • Published Apr 26 • 12
Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling

Paper • 2405.21048 • Published May 31 • 13
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model

Paper • 2406.04333 • Published Jun 6 • 36
Depth Anything V2

Paper • 2406.09414 • Published Jun 13 • 95
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

Paper • 2406.09416 • Published Jun 13 • 27
Interpreting the Weight Space of Customized Diffusion Models

Paper • 2406.09413 • Published Jun 13 • 18
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

Paper • 2406.09162 • Published Jun 13 • 13
Make It Count: Text-to-Image Generation with an Accurate Number of Objects

Paper • 2406.10210 • Published Jun 14 • 76
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering

Paper • 2406.10208 • Published Jun 14 • 21
SHIC: Shape-Image Correspondences with no Keypoint Supervision

Paper • 2407.18907 • Published Jul 26 • 40
Matting by Generation

Paper • 2407.21017 • Published Jul 30 • 22
Imagen 3

Paper • 2408.07009 • Published Aug 13 • 61
ControlNeXt: Powerful and Efficient Control for Image and Video Generation

Paper • 2408.06070 • Published Aug 12 • 53
LinFusion: 1 GPU, 1 Minute, 16K Image

Paper • 2409.02097 • Published Sep 3 • 32
Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing

Paper • 2409.01322 • Published Sep 2 • 94
OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17 • 108
Pixel-Space Post-Training of Latent Diffusion Models

Paper • 2409.17565 • Published Sep 26 • 20
MaskBit: Embedding-free Image Generation via Bit Tokens

Paper • 2409.16211 • Published Sep 24 • 16