Flux Edit

Prompt
Give this the look of a traditional Japanese woodblock print.
Prompt
transform the setting to a winter scene
Prompt
turn the color of mushroom to gray
Prompt
Change it to look like it's in the style of an impasto painting.

These are the control weights trained on black-forest-labs/FLUX.1-dev and TIGER-Lab/OmniEdit-Filtered-1.2M for image editing. We use the Flux Control framework for fine-tuning.

License

Please adhere to the licensing terms as described here

Intended uses & limitations

Inference

from diffusers import FluxControlPipeline, FluxTransformer2DModel
from diffusers.utils import load_image
import torch 

path = "sayakpaul/FLUX.1-dev-edit-v0" 
edit_transformer = FluxTransformer2DModel.from_pretrained(path, torch_dtype=torch.bfloat16)
pipeline = FluxControlPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=edit_transformer, torch_dtype=torch.bfloat16
).to("cuda")

url = "https://huggingface.co./datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/assets/mushroom.jpg"
image = load_image(url) # resize as needed.
print(image.size)

prompt = "turn the color of mushroom to gray"
image = pipeline(
    control_image=image,
    prompt=prompt,
    guidance_scale=30., # change this as needed.
    num_inference_steps=50, # change this as needed.
    max_sequence_length=512,
    height=image.height,
    width=image.width,
    generator=torch.manual_seed(0)
).images[0]
image.save("edited_image.png")

Speeding inference with a turbo LoRA

We can speed up the inference by reducing the num_inference_steps to produce a nice image by using turbo LoRA like ByteDance/Hyper-SD.

Make sure to install peft before running the code below: pip install -U peft.

Code
from diffusers import FluxControlPipeline, FluxTransformer2DModel
from diffusers.utils import load_image
from huggingface_hub import hf_hub_download
import torch

path = "sayakpaul/FLUX.1-dev-edit-v0"
edit_transformer = FluxTransformer2DModel.from_pretrained(path, torch_dtype=torch.bfloat16)
pipeline = FluxControlPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev", transformer=edit_transformer, torch_dtype=torch.bfloat16
).to("cuda")

# load the turbo LoRA
pipeline.load_lora_weights(
    hf_hub_download("ByteDance/Hyper-SD", "Hyper-FLUX.1-dev-8steps-lora.safetensors"), adapter_name="hyper-sd"
)
pipeline.set_adapters(["hyper-sd"], adapter_weights=[0.125])


url = "https://huggingface.co./datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/assets/mushroom.jpg"
image = load_image(url) # resize as needed.
print(image.size)

prompt = "turn the color of mushroom to gray"
image = pipeline(
    control_image=image,
    prompt=prompt,
    guidance_scale=30., # change this as needed.
    num_inference_steps=8, # change this as needed.
    max_sequence_length=512,
    height=image.height,
    width=image.width,
    generator=torch.manual_seed(0)
).images[0]
image.save("edited_image.png")

Comparison
50 steps 8 steps
50 steps 1 8 steps 1
50 steps 2 8 steps 2
50 steps 3 8 steps 3
50 steps 4 8 steps 4

You can also choose to perform quantization if the memory requirements cannot be satisfied further w.r.t your hardware. Refer to the Diffusers documentation to learn more.

guidance_scale also impacts the results:

Prompt Collage (gs: 10) Collage (gs: 20) Collage (gs: 30) Collage (gs: 40)
Give this the look of a traditional Japanese woodblock print. Edited Image gs 10 Edited Image gs 20 Edited Image gs 30 Edited Image gs 40
transform the setting to a winter scene Edited Image gs 10 Edited Image gs 20 Edited Image gs 30 Edited Image gs 40
turn the color of mushroom to gray Edited Image gs 10 Edited Image gs 20 Edited Image gs 30 Edited Image gs 40

Limitations and bias

Expect the model to perform underwhelmingly as we don't know the exact training details of Flux Control.

Training details

Fine-tuning codebase is here. Training hyperparameters:

  • Per GPU batch size: 4
  • Gradient accumulation steps: 4
  • Guidance scale: 30
  • BF16 mixed-precision
  • AdamW optimizer (8bit from bitsandbytes)
  • Constant learning rate of 5e-5
  • Weight decay of 1e-6
  • 20000 training steps

Training was conducted using a node of 8xH100s.

We used a simplified flow mechanism to perform the linear interpolation. In pseudo-code, that looks like:

sigmas = torch.rand(batch_size)
timesteps = (sigmas * noise_scheduler.config.num_train_timesteps).long()
...

noisy_model_input = (1.0 - sigmas) * pixel_latents + sigmas * noise

where pixel_latents is computed from the source images and noise is drawn from a Gaussian distribution. For more details, check out the repository.

Downloads last month
325
Inference API
Examples

Model tree for sayakpaul/FLUX.1-dev-edit-v0

Finetuned
(313)
this model

Dataset used to train sayakpaul/FLUX.1-dev-edit-v0

Spaces using sayakpaul/FLUX.1-dev-edit-v0 2