Diffusers documentation

Stable diffusion XL

Diffusers

You are viewing v0.18.2 version. A newer version v0.32.1 is available.

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

Stable diffusion XL

Stable Diffusion XL was proposed in SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis by Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, Robin Rombach

The abstract of the paper is the following:

We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators.

Tips

Stable Diffusion XL works especially well with images between 768 and 1024.
Stable Diffusion XL output image can be improved by making use of a refiner as shown below.

Available checkpoints:

Text-to-Image (1024x1024 resolution): stabilityai/stable-diffusion-xl-base-0.9 with StableDiffusionXLPipeline
Image-to-Image / Refiner (1024x1024 resolution): stabilityai/stable-diffusion-xl-refiner-0.9 with StableDiffusionXLImg2ImgPipeline

Usage Example

Before using SDXL make sure to have transformers, accelerate, safetensors and invisible_watermark installed. You can install the libraries as follows:

pip install transformers
pip install accelerate
pip install safetensors
pip install invisible-watermark>=2.0

Text-to-Image

You can use SDXL as follows for text-to-image:

from diffusers import StableDiffusionXLPipeline
import torch

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe.to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt=prompt).images[0]

Refining the image output

The image can be refined by making use of stabilityai/stable-diffusion-xl-refiner-0.9. In this case, you only have to output the latents from the base model.

from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline
import torch

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe.to("cuda")

use_refiner = True
refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16"
)
refiner.to("cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"

image = pipe(prompt=prompt, output_type="latent" if use_refiner else "pil").images[0]
image = refiner(prompt=prompt, image=image[None, :]).images[0]

Image-to-image

import torch
from diffusers import StableDiffusionXLImg2ImgPipeline
from diffusers.utils import load_image

pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
url = "https://huggingface.co./datasets/patrickvonplaten/images/resolve/main/aa_xl/000000009.png"

init_image = load_image(url).convert("RGB")
prompt = "a photo of an astronaut riding a horse on mars"
image = pipe(prompt, image=init_image).images[0]

Original Image	Refined Image

Loading single file checkpoints / original file format

By making use of from_single_file() you can also load the original file format into diffusers:

from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline
import torch

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe.to("cuda")

refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, use_safetensors=True, variant="fp16"
)
refiner.to("cuda")

Memory optimization via model offloading

If you are seeing out-of-memory errors, we recommend making use of StableDiffusionXLPipeline.enable_model_cpu_offload().

- pipe.to("cuda")
+ pipe.enable_model_cpu_offload()

and

- refiner.to("cuda")
+ refiner.enable_model_cpu_offload()

Speed-up inference with `torch.compile`

You can speed up inference by making use of torch.compile. This should give you ca. 20% speed-up.

+ pipe.unet = torch.compile(pipe.unet, mode="reduce-overhead", fullgraph=True)
+ refiner.unet = torch.compile(refiner.unet, mode="reduce-overhead", fullgraph=True)

Running with `torch` \< 2.0

Note that if you want to run Stable Diffusion XL with torch < 2.0, please make sure to enable xformers attention:

pip install xformers

+pipe.enable_xformers_memory_efficient_attention()
+refiner.enable_xformers_memory_efficient_attention()

Diffusers

Stable diffusion XL

Tips

Available checkpoints:

Usage Example

Text-to-Image

Refining the image output

Image-to-image

Loading single file checkpoints / original file format

Memory optimization via model offloading

Speed-up inference with torch.compile

Running with torch \&lt; 2.0

StableDiffusionXLPipeline

class diffusers.StableDiffusionXLPipeline

__call__

disable_vae_slicing

disable_vae_tiling

enable_model_cpu_offload

enable_sequential_cpu_offload

enable_vae_slicing

enable_vae_tiling

encode_prompt

StableDiffusionXLImg2ImgPipeline

class diffusers.StableDiffusionXLImg2ImgPipeline

__call__

disable_vae_slicing

disable_vae_tiling

enable_model_cpu_offload

enable_sequential_cpu_offload

enable_vae_slicing

enable_vae_tiling

encode_prompt

Speed-up inference with `torch.compile`

Running with `torch` \< 2.0

call

call