sayakpaul HF staff commited on
Commit
ab9565b
·
verified ·
1 Parent(s): 29dc7d4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +180 -10
README.md CHANGED
@@ -16,10 +16,11 @@ tags:
16
  should probably proofread and complete it, then remove this comment. -->
17
 
18
 
19
- # flux-control-sayakpaul/omniflux-lr_5e-5-wd_1e-6-gs_30.0-cd_0.0-scheduler_constant-simplied_flow
20
 
21
- These are Control weights trained on black-forest-labs/FLUX.1-dev and [TIGER-Lab/OmniEdit-Filtered-1.2M](https://huggingface.co/datasets/TIGER-Lab/OmniEdit-Filtered-1.2M).
22
- You can find some example images below.
 
23
 
24
  prompt: Give this the look of a traditional Japanese woodblock print.
25
  ![images_0)](./images_0.png)
@@ -62,16 +63,185 @@ Please adhere to the licensing terms as described [here](https://huggingface.co/
62
 
63
  ## Intended uses & limitations
64
 
65
- #### How to use
66
-
67
- ```python
68
- # TODO: add an example code snippet for running this diffusion pipeline
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
69
  ```
70
 
71
- #### Limitations and bias
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
 
73
- [TODO: provide examples of latent issues and potential remediations]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
 
75
  ## Training details
76
 
77
- [TODO: describe the data used to train the model]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  should probably proofread and complete it, then remove this comment. -->
17
 
18
 
19
+ # Flux Edit
20
 
21
+ These are the control weights trained on [black-forest-labs/FLUX.1-dev](htpss://hf.co/black-forest-labs/FLUX.1-dev)
22
+ and [TIGER-Lab/OmniEdit-Filtered-1.2M](https://huggingface.co/datasets/TIGER-Lab/OmniEdit-Filtered-1.2M) for image editing. We use the
23
+ [Flux Control framework](https://blackforestlabs.ai/flux-1-tools/) for fine-tuning.
24
 
25
  prompt: Give this the look of a traditional Japanese woodblock print.
26
  ![images_0)](./images_0.png)
 
63
 
64
  ## Intended uses & limitations
65
 
66
+ ### Inference
67
+
68
+ ```py
69
+ from diffusers import FluxControlPipeline, FluxTransformer2DModel
70
+ from diffusers.utils import load_image
71
+ import torch
72
+
73
+ path = "sayakpaul/FLUX.1-dev-edit-v0" # to change
74
+ edit_transformer = FluxTransformer2DModel.from_pretrained(path, torch_dtype=torch.bfloat16)
75
+ pipeline = FluxControlPipeline.from_pretrained(
76
+ "black-forest-labs/FLUX.1-dev", transformer=edit_transformer, torch_dtype=torch.bfloat16
77
+ ).to("cuda")
78
+
79
+ image = load_image("./assets/mushroom.jpg") # resize as needed.
80
+ print(image.size)
81
+
82
+ prompt = "turn the color of mushroom to gray"
83
+ image = pipeline(
84
+ control_image=image,
85
+ prompt=prompt,
86
+ guidance_scale=30., # change this as needed.
87
+ num_inference_steps=50, # change this as needed.
88
+ max_sequence_length=512,
89
+ height=image.height,
90
+ width=image.width,
91
+ generator=torch.manual_seed(0)
92
+ ).images[0]
93
+ image.save("edited_image.png")
94
  ```
95
 
96
+ ### Speeding inference with a turbo LoRA
97
+
98
+ We can speed up the inference by reducing the `num_inference_steps` to produce a nice image by using turbo LoRA like [`ByteDance/Hyper-SD`](https://hf.co/ByteDance/Hyper-SD).
99
+
100
+ Make sure to install `peft` before running the code below: `pip install -U peft`.
101
+
102
+ <details>
103
+ <summary>Code</summary>
104
+
105
+ ```py
106
+ from diffusers import FluxControlPipeline, FluxTransformer2DModel
107
+ from diffusers.utils import load_image
108
+ from huggingface_hub import hf_hub_download
109
+ import torch
110
+
111
+ path = "sayakpaul/FLUX.1-dev-edit-v0" # to change
112
+ edit_transformer = FluxTransformer2DModel.from_pretrained(path, torch_dtype=torch.bfloat16)
113
+ control_pipe = FluxControlPipeline.from_pretrained(
114
+ "black-forest-labs/FLUX.1-dev", transformer=edit_transformer, torch_dtype=torch.bfloat16
115
+ ).to("cuda")
116
+
117
+ # load the turbo LoRA
118
+ control_pipe.load_lora_weights(
119
+ hf_hub_download("ByteDance/Hyper-SD", "Hyper-FLUX.1-dev-8steps-lora.safetensors"), adapter_name="hyper-sd"
120
+ )
121
+ control_pipe.set_adapters(["hyper-sd"], adapter_weights=[0.125])
122
+
123
+ image = load_image("./assets/mushroom.jpg") # resize as needed.
124
+ print(image.size)
125
+
126
+ prompt = "turn the color of mushroom to gray"
127
+ image = pipeline(
128
+ control_image=image,
129
+ prompt=prompt,
130
+ guidance_scale=30., # change this as needed.
131
+ num_inference_steps=8, # change this as needed.
132
+ max_sequence_length=512,
133
+ height=image.height,
134
+ width=image.width,
135
+ generator=torch.manual_seed(0)
136
+ ).images[0]
137
+ image.save("edited_image.png")
138
+ ```
139
 
140
+ </details>
141
+ <br><br>
142
+ <details>
143
+ <summary>Comparison</summary>
144
+
145
+ <table align="center">
146
+ <tr>
147
+ <th>50 steps</th>
148
+ <th>8 steps</th>
149
+ </tr>
150
+ <tr>
151
+ <td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_car.jpg" alt="50 steps 1" width="150"></td>
152
+ <td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_8steps_car.jpg" alt="8 steps 1" width="150"></td>
153
+ </tr>
154
+ <tr>
155
+ <td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_norte_dam.jpg" alt="50 steps 2" width="150"></td>
156
+ <td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_8steps_norte_dam.jpg" alt="8 steps 2" width="150"></td>
157
+ </tr>
158
+ <tr>
159
+ <td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_mushroom.jpg" alt="50 steps 3" width="150"></td>
160
+ <td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_8steps_mushroom.jpg" alt="8 steps 3" width="150"></td>
161
+ </tr>
162
+ <tr>
163
+ <td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_green_creature.jpg" alt="50 steps 4" width="150"></td>
164
+ <td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_8steps_green_creature.jpg" alt="8 steps 4" width="150"></td>
165
+ </tr>
166
+ </table>
167
+
168
+
169
+ </details>
170
+
171
+ You can also choose to perform quantization if the memory requirements cannot be satisfied further w.r.t your hardware. Refer to the [Diffusers documentation](https://huggingface.co/docs/diffusers/main/en/quantization/overview) to learn more.
172
+
173
+ `guidance_scale` also impacts the results:
174
+
175
+ <table align="center">
176
+ <tr>
177
+ <th>Source Image</th>
178
+ <th>Edited Image (gs: 10)</th>
179
+ <th>Edited Image (gs: 20)</th>
180
+ <th>Edited Image (gs: 30)</th>
181
+ <th>Edited Image (gs: 40)</th>
182
+ </tr>
183
+ <tr>
184
+ <td align="center">
185
+ <img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/assets/car.jpg" alt="Source Image 1" width="150"><br>
186
+ <em>Give this the look of a traditional Japanese woodblock print.</em>
187
+ </td>
188
+ <td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_gs-10_car.jpg" alt="Edited Image gs 10" width="150"></td>
189
+ <td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_gs-20_car.jpg" alt="Edited Image gs 20" width="150"></td>
190
+ <td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_gs-30_car.jpg" alt="Edited Image gs 30" width="150"></td>
191
+ <td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_gs-40_car.jpg" alt="Edited Image gs 40" width="150"></td>
192
+ </tr>
193
+ <tr>
194
+ <td align="center">
195
+ <img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/assets/green_creature" alt="Source Image 2" width="150"><br>
196
+ <em>transform the setting to a winter scene</em>
197
+ </td>
198
+ <td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_gs-10_green_creature.jpg" alt="Edited Image gs 10" width="150"></td>
199
+ <td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_gs-20_green_creature.jpg" alt="Edited Image gs 20" width="150"></td>
200
+ <td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_gs-30_green_creature.jpg" alt="Edited Image gs 30" width="150"></td>
201
+ <td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_gs-40_green_creature.jpg" alt="Edited Image gs 40" width="150"></td>
202
+ </tr>
203
+ <tr>
204
+ <td align="center">
205
+ <img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/assets/mushroom.jpg" alt="Source Image 3" width="150"><br>
206
+ <em>turn the color of mushroom to gray</em>
207
+ </td>
208
+ <td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_gs-10_mushroom.jpg" alt="Edited Image gs 10" width="150"></td>
209
+ <td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_gs-20_mushroom.jpg" alt="Edited Image gs 20" width="150"></td>
210
+ <td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_gs-30_mushroom.jpg" alt="Edited Image gs 30" width="150"></td>
211
+ <td align="center"><img src="https://huggingface.co/datasets/sayakpaul/sample-datasets/resolve/main/flux-edit-artifacts/edited_gs-40_mushroom.jpg" alt="Edited Image gs 40" width="150"></td>
212
+ </tr>
213
+ </table>
214
+
215
+
216
+ ### Limitations and bias
217
+
218
+ Expect the model to perform underwhelmingly as we don't know the exact training details of Flux Control.
219
 
220
  ## Training details
221
 
222
+ Fine-tuning codebase is [here](https://github.com/sayakpaul/flux-image-editing). Training hyperparameters:
223
+
224
+ * Per GPU batch size: 4
225
+ * Gradient accumulation steps: 4
226
+ * Guidance scale: 30
227
+ * BF16 mixed-precision
228
+ * AdamW optimizer (8bit from `bitsandbytes`)
229
+ * Constant learning rate of 5e-5
230
+ * Weight decay of 1e-6
231
+ * 20000 training steps
232
+
233
+
234
+ Training was conducted using a node of 8xH100s.
235
+
236
+ We used a simplified flow mechanism to perform the linear interpolation. In pseudo-code, that looks like:
237
+
238
+ ```py
239
+ sigmas = torch.rand(batch_size)
240
+ timesteps = (sigmas * noise_scheduler.config.num_train_timesteps).long()
241
+ ...
242
+
243
+ noisy_model_input = (1.0 - sigmas) * pixel_latents + sigmas * noise
244
+ ```
245
+
246
+ where `pixel_latents` is computed from the source images and `noise` is drawn from a Gaussian distribution. For more details, check out
247
+ the repository.