byliutao commited on
Commit
31c1396
·
verified ·
1 Parent(s): 26fa94b

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ resource/photo.gif filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ result
2
+ **__pycache__
3
+ .gradio
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2024 FoundationVision
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
README.md CHANGED
@@ -1,12 +1,90 @@
1
  ---
2
  title: 1Prompt1Story
3
- emoji: 👁
4
- colorFrom: indigo
5
- colorTo: indigo
6
- sdk: gradio
7
- sdk_version: 5.13.0
8
  app_file: app.py
9
- pinned: false
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: 1Prompt1Story
 
 
 
 
 
3
  app_file: app.py
4
+ sdk: gradio
5
+ sdk_version: 4.44.1
6
  ---
7
 
8
+ <h1 align="center">
9
+ <!-- <br>
10
+ <a href="http://www.amitmerchant.com/electron-markdownify"><img src="https://raw.githubusercontent.com/amitmerchant1990/electron-markdownify/master/app/img/markdownify.png" alt="Markdownify" width="200"></a>
11
+ <br> -->
12
+ 🔥 One-Prompt-One-Story: Free-Lunch Consistent Text-to-Image Generation Using a Single Prompt
13
+ <br>
14
+ </h1>
15
+
16
+
17
+ <div align="center">
18
+
19
+ [![demo](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Online_Demo-blue)]()&nbsp;
20
+ [![arXiv](https://img.shields.io/badge/arXiv%20paper-2406.06525-b31b1b.svg)]()&nbsp;
21
+ [![project page](https://img.shields.io/badge/Project_page-More_visualizations-green)]()&nbsp;
22
+
23
+ </div>
24
+
25
+
26
+
27
+ <p align="center">
28
+ <a href="#key-features">Key Features</a> •
29
+ <a href="#how-to-use">How To Use</a> •
30
+ <a href="#license">License</a> •
31
+ <a href="#Citation">Citation</a>
32
+ </p>
33
+
34
+ <p align="center">
35
+ <img src="./resource/photo.gif" alt="screenshot" />
36
+ </p>
37
+
38
+
39
+ ## Key Features
40
+
41
+ * Consistent Identity Image Generation.
42
+ * Gradio Demo.
43
+ * Consistory+ Benchmark: contains 200 prompt sets, with each set containing between 5 and 10 prompts, categorized into 8 superclasses: humans, animals, fantasy, inanimate, fairy tales, nature, technology.
44
+ * Benchmark Generation Code.
45
+
46
+
47
+ ## How To Use
48
+
49
+ To clone and run this application, you'll need [Git](https://git-scm.com) and [Node.js](https://nodejs.org/en/download/) (which comes with [npm](http://npmjs.com)) installed on your computer. From your command line:
50
+
51
+ ```bash
52
+ # Clone this repository
53
+ $ git clone https://github.com/byliutao/1Prompt1Story
54
+
55
+ # Go into the repository
56
+ $ cd 1Prompt1Story
57
+
58
+ ### Install dependencies ###
59
+ $ conda create --name 1p1s python=3.10
60
+ $ conda activate 1p1s
61
+ # choose the right cuda version of your device
62
+ $ conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
63
+ $ conda install conda-forge::transformers
64
+ $ conda install -c conda-forge diffusers
65
+ $ pip install opencv-python scipy gradio=4.44.1 sympy==1.13.1
66
+ ### Install dependencies ENDs ###
67
+
68
+ # Run sample code
69
+ $ python main.py
70
+
71
+ # Run gradio demo
72
+ $ python app.py
73
+
74
+ # Run Consistory+ benchmark
75
+ $ python -m resource.gen_benchmark --save_dir ./result/benchmark --benchmark_path ./resource/consistory+.yaml
76
+ ```
77
+
78
+ > **Note**
79
+ > If you're using Linux Bash for Windows, [see this guide](https://www.howtogeek.com/261575/how-to-run-graphical-linux-desktop-applications-from-windows-10s-bash-shell/) or use `node` from the command prompt.
80
+
81
+
82
+ ## License
83
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
84
+
85
+
86
+ ## Citation
87
+ If our work assists your research, feel free to give us a star ⭐ or cite us using:
88
+ ```
89
+ arixv
90
+ ```
app.py ADDED
@@ -0,0 +1,217 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import diffusers
3
+ import random
4
+ import json
5
+ diffusers.utils.logging.set_verbosity_error()
6
+ import torch
7
+ from PIL import Image
8
+ import numpy as np
9
+
10
+ from unet.unet_controller import UNetController
11
+ from main import load_unet_controller
12
+ from unet import utils
13
+
14
+
15
+ # Global flag to control interruption
16
+ interrupt_flag = False
17
+
18
+
19
+ def main_gradio(model_path, id_prompt, frame_prompt_list, precision, seed, window_length, alpha_weaken, beta_weaken, alpha_enhance, beta_enhance, ipca_drop_out, use_freeu, use_same_init_noise):
20
+ global interrupt_flag
21
+ interrupt_flag = False # Reset the flag at the start of the function
22
+
23
+ if seed == -1:
24
+ seed = random.randint(0, 2**32 - 1)
25
+ frame_prompt_list = frame_prompt_list.split(",")
26
+ pipe, _ = utils.load_pipe_from_path(model_path, "cuda:1", torch.float16 if precision == "fp16" else torch.float32, precision)
27
+
28
+ if interrupt_flag:
29
+ print("Generation interrupted")
30
+ del pipe
31
+ torch.cuda.empty_cache()
32
+
33
+ if 'story_image' not in locals():
34
+ empty_image = Image.fromarray(np.zeros((512, 512, 3), dtype=np.uint8))
35
+ yield empty_image
36
+
37
+ return
38
+
39
+ unet_controller = load_unet_controller(pipe, "cuda:1")
40
+ unet_controller.Alpha_enhance = alpha_enhance
41
+ unet_controller.Beta_enhance = beta_enhance
42
+ unet_controller.Alpha_weaken = alpha_weaken
43
+ unet_controller.Beta_weaken = beta_weaken
44
+ unet_controller.Ipca_dropout = ipca_drop_out
45
+ unet_controller.Is_freeu_enabled = use_freeu
46
+ unet_controller.Use_same_init_noise = use_same_init_noise
47
+
48
+ import os
49
+ from datetime import datetime
50
+
51
+ current_time = datetime.now().strftime("%Y%m%d%H")
52
+ current_time_ = datetime.now().strftime("%M%S")
53
+ save_dir = os.path.join(".", f'result/{current_time}/{current_time_}_gradio_seed{seed}')
54
+ os.makedirs(save_dir, exist_ok=True)
55
+
56
+ generate = torch.Generator().manual_seed(seed)
57
+ if unet_controller.Use_ipca is True:
58
+ unet_controller.Store_qkv = True
59
+ original_prompt_embeds_mode = unet_controller.Prompt_embeds_mode
60
+ unet_controller.Prompt_embeds_mode = "original"
61
+ _ = pipe(id_prompt, generator=generate, unet_controller=unet_controller).images
62
+ unet_controller.Prompt_embeds_mode = original_prompt_embeds_mode
63
+
64
+ unet_controller.Store_qkv = False
65
+ max_window_length = utils.get_max_window_length(unet_controller, id_prompt, frame_prompt_list)
66
+ window_length = min(window_length, max_window_length)
67
+ if window_length < len(frame_prompt_list):
68
+ movement_lists = utils.circular_sliding_windows(frame_prompt_list, window_length)
69
+ else:
70
+ movement_lists = [movement for movement in frame_prompt_list]
71
+
72
+ story_image_list = []
73
+ generate = torch.Generator().manual_seed(seed)
74
+ unet_controller.id_prompt = id_prompt
75
+ for index, movement in enumerate(frame_prompt_list):
76
+ if interrupt_flag:
77
+ print("Generation interrupted")
78
+ del pipe
79
+ torch.cuda.empty_cache()
80
+
81
+ if 'story_image' not in locals():
82
+ empty_image = Image.fromarray(np.zeros((512, 512, 3), dtype=np.uint8))
83
+ yield empty_image
84
+
85
+ return
86
+
87
+ if unet_controller is not None:
88
+ if window_length < len(frame_prompt_list):
89
+ unet_controller.frame_prompt_suppress = movement_lists[index][1:]
90
+ unet_controller.frame_prompt_express = movement_lists[index][0]
91
+ gen_propmts = [f'{id_prompt} {" ".join(movement_lists[index])}']
92
+ else:
93
+ unet_controller.frame_prompt_suppress = movement_lists[:index] + movement_lists[index+1:]
94
+ unet_controller.frame_prompt_express = movement_lists[index]
95
+ gen_propmts = [f'{id_prompt} {" ".join(movement_lists)}']
96
+ else:
97
+ gen_propmts = f'{id_prompt} {movement}'
98
+
99
+
100
+ print(f"suppress: {unet_controller.frame_prompt_suppress}")
101
+ print(f"express: {unet_controller.frame_prompt_express}")
102
+ print(f'id_prompt: {id_prompt}')
103
+ print(f"gen_propmts: {gen_propmts}")
104
+
105
+ if unet_controller is not None and unet_controller.Use_same_init_noise is True:
106
+ generate = torch.Generator().manual_seed(seed)
107
+
108
+ images = pipe(gen_propmts, generator=generate, unet_controller=unet_controller).images
109
+ story_image_list.append(images[0])
110
+
111
+ story_image = np.concatenate(story_image_list, axis=1)
112
+ story_image = Image.fromarray(story_image.astype(np.uint8))
113
+
114
+ yield story_image
115
+ import os
116
+ images[0].save(os.path.join(save_dir, f'{id_prompt} {unet_controller.frame_prompt_express}.jpg'))
117
+
118
+ story_image.save(os.path.join(save_dir, 'story_image.jpg'))
119
+
120
+ import gc
121
+ del pipe
122
+ gc.collect()
123
+ torch.cuda.empty_cache()
124
+
125
+ # Gradio interface
126
+ def gradio_interface():
127
+ global interrupt_flag
128
+
129
+ with gr.Blocks() as demo:
130
+ gr.Markdown("### Consistent Image Generation with 1Prompt1Story")
131
+
132
+ # Load JSON data
133
+ with open('./resource/example.json', 'r') as f:
134
+ data = json.load(f)
135
+
136
+ # Extract id_prompts and frame_prompts
137
+ id_prompts = [item['id_prompt'] for item in data['combinations']]
138
+ frame_prompts = [", ".join(item['frame_prompt_list']) for item in data['combinations']]
139
+
140
+ # Input fields
141
+ id_prompt = gr.Dropdown(
142
+ label="ID Prompt",
143
+ choices=id_prompts,
144
+ value=id_prompts[0],
145
+ allow_custom_value=True
146
+ )
147
+ frame_prompt_list = gr.Dropdown(
148
+ label="Frame Prompts (comma-separated)",
149
+ choices=frame_prompts,
150
+ value=frame_prompts[0],
151
+ allow_custom_value=True
152
+ )
153
+ model_path = gr.Dropdown(
154
+ label="Model Path",
155
+ choices=["stabilityai/stable-diffusion-xl-base-1.0", "RunDiffusion/Juggernaut-X-v10", "playgroundai/playground-v2.5-1024px-aesthetic", "SG161222/RealVisXL_V4.0", "RunDiffusion/Juggernaut-XI-v11", "SG161222/RealVisXL_V5.0"],
156
+ value="playgroundai/playground-v2.5-1024px-aesthetic",
157
+ allow_custom_value=True
158
+ )
159
+
160
+ with gr.Row():
161
+ seed = gr.Slider(label="Seed (set -1 for random seed)", minimum=-1, maximum=10000, value=-1, step=1)
162
+ window_length = gr.Slider(label="Window Length", minimum=1, maximum=20, value=10, step=1)
163
+
164
+ with gr.Row():
165
+ alpha_weaken = gr.Number(label="Alpha Weaken", value=UNetController.Alpha_weaken, interactive=True, step=0.01)
166
+ beta_weaken = gr.Number(label="Beta Weaken", value=UNetController.Beta_weaken, interactive=True, step=0.01)
167
+ alpha_enhance = gr.Number(label="Alpha Enhance", value=UNetController.Alpha_enhance, interactive=True, step=0.001)
168
+ beta_enhance = gr.Number(label="Beta Enhance", value=UNetController.Beta_enhance, interactive=True, step=0.1)
169
+
170
+ with gr.Row():
171
+ ipca_drop_out = gr.Number(label="Ipca Dropout", value=UNetController.Ipca_dropout, interactive=True, step=0.1, minimum=0, maximum=1)
172
+ precision = gr.Dropdown(label="Precision", choices=["fp16", "fp32"], value="fp16")
173
+ use_freeu = gr.Dropdown(label="Use FreeU", choices=[False, True], value=UNetController.Is_freeu_enabled)
174
+ use_same_init_noise = gr.Dropdown(label="Use Same Init Noise", choices=[True, False], value=UNetController.Use_same_init_noise)
175
+
176
+ reset_button = gr.Button("Reset to Default")
177
+
178
+ def reset_values():
179
+ return UNetController.Alpha_weaken, UNetController.Beta_weaken, UNetController.Alpha_enhance, UNetController.Beta_enhance, UNetController.Ipca_dropout, "fp16", UNetController.Is_freeu_enabled, UNetController.Use_same_init_noise
180
+
181
+ reset_button.click(
182
+ fn=reset_values,
183
+ inputs=[],
184
+ outputs=[alpha_weaken, beta_weaken, alpha_enhance, beta_enhance, ipca_drop_out, precision, use_freeu, use_same_init_noise]
185
+ )
186
+
187
+ # Output
188
+ output_gallery = gr.Image()
189
+
190
+ # Buttons
191
+ generate_button = gr.Button("Generate Images")
192
+ interrupt_button = gr.Button("Interrupt")
193
+
194
+ def interrupt_generation():
195
+ global interrupt_flag
196
+ interrupt_flag = True
197
+
198
+ interrupt_button.click(
199
+ fn=interrupt_generation,
200
+ inputs=[],
201
+ outputs=[]
202
+ )
203
+
204
+ generate_button.click(
205
+ fn=main_gradio,
206
+ inputs=[
207
+ model_path, id_prompt, frame_prompt_list, precision, seed, window_length, alpha_weaken, beta_weaken, alpha_enhance, beta_enhance, ipca_drop_out, use_freeu, use_same_init_noise
208
+ ],
209
+ outputs=output_gallery
210
+ )
211
+
212
+ return demo
213
+
214
+
215
+ if __name__ == "__main__":
216
+ demo = gradio_interface()
217
+ demo.launch(share=True)
main.py ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import torch
3
+ import random
4
+ import diffusers
5
+ import torch.utils
6
+ import unet.utils as utils
7
+ from unet.unet_controller import UNetController
8
+ import argparse
9
+ from datetime import datetime
10
+
11
+ diffusers.utils.logging.set_verbosity_error()
12
+
13
+ def load_unet_controller(pipe, device):
14
+ unet_controller = UNetController()
15
+ unet_controller.device = device
16
+ unet_controller.tokenizer = pipe.tokenizer
17
+
18
+ return unet_controller
19
+
20
+
21
+ def generate_images(unet_controller: UNetController, pipe, id_prompt, frame_prompt_list, save_dir, window_length, seed, verbose=True):
22
+ generate = torch.Generator().manual_seed(seed)
23
+ if unet_controller.Use_ipca is True:
24
+ unet_controller.Store_qkv = True
25
+ original_prompt_embeds_mode = unet_controller.Prompt_embeds_mode
26
+ unet_controller.Prompt_embeds_mode = "original"
27
+ _ = pipe(id_prompt, generator=generate, unet_controller=unet_controller).images
28
+ unet_controller.Prompt_embeds_mode = original_prompt_embeds_mode
29
+
30
+
31
+ unet_controller.Store_qkv = False
32
+ images, story_image = utils.movement_gen_story_slide_windows(
33
+ id_prompt, frame_prompt_list, pipe, window_length, seed, unet_controller, save_dir, verbose=verbose
34
+ )
35
+
36
+ return images, story_image
37
+
38
+
39
+ def main(device, model_path, save_dir, id_prompt, frame_prompt_list, precision, seed, window_length):
40
+ pipe, _ = utils.load_pipe_from_path(model_path, device, torch.float16 if precision == "fp16" else torch.float32, precision)
41
+
42
+ unet_controller = load_unet_controller(pipe, device)
43
+ images, story_image = generate_images(unet_controller, pipe, id_prompt, frame_prompt_list, save_dir, window_length, seed)
44
+
45
+ return images, story_image
46
+
47
+
48
+ if __name__ == "__main__":
49
+ parser = argparse.ArgumentParser(description="Generate images using a specific device.")
50
+ parser.add_argument('--device', type=str, default='cuda:0', help='Device to use for computation (e.g., cuda:0, cpu)')
51
+ parser.add_argument('--model_path', type=str, default='playgroundai/playground-v2.5-1024px-aesthetic', help='Path to the model')
52
+ parser.add_argument('--project_base_path', type=str, default='.', help='Path to save the generated images')
53
+ parser.add_argument('--id_prompt', type=str, default="A photo of a red fox with coat", help='Initial prompt for image generation')
54
+ parser.add_argument('--frame_prompt_list', type=str, nargs='+', default=[
55
+ "wearing a scarf in a meadow",
56
+ "playing in the snow",
57
+ "at the edge of a village with river",
58
+ ], help='List of frame prompts')
59
+ parser.add_argument('--precision', type=str, choices=["fp16", "fp32"], default="fp16", help='Model precision')
60
+ parser.add_argument('--seed', type=int, default=42, help='Random seed for generation')
61
+ parser.add_argument('--window_length', type=int, default=10, help='Window length for story generation')
62
+ parser.add_argument('--save_padding', type=str, default='test', help='Padding for save directory')
63
+ parser.add_argument('--random_seed', action='store_true', help='Use random seed')
64
+ parser.add_argument('--json_path', type=str,)
65
+
66
+ args = parser.parse_args()
67
+ if args.random_seed:
68
+ args.seed = random.randint(0, 1000000)
69
+
70
+ current_time = datetime.now().strftime("%Y%m%d%H")
71
+ current_time_ = datetime.now().strftime("%M%S")
72
+ save_dir = os.path.join(args.project_base_path, f'result/{current_time}/{current_time_}_{args.save_padding}_seed{args.seed}')
73
+ os.makedirs(save_dir, exist_ok=True)
74
+
75
+ if args.json_path is None:
76
+ id_prompt = "A cinematic portrait of a man and a woman standing together"
77
+ frame_prompt_list = [
78
+ "under a sky full of stars",
79
+ "on a bustling city street at night",
80
+ "in a dimly lit jazz club",
81
+ "walking along a sandy beach at sunset",
82
+ "in a cozy coffee shop with large windows",
83
+ "in a vibrant art gallery surrounded by paintings",
84
+ "under an umbrella during a soft rain",
85
+ "on a quiet park bench amidst falling leaves",
86
+ "standing on a rooftop overlooking the city skyline"
87
+ ]
88
+ main(args.device, args.model_path, save_dir, id_prompt, frame_prompt_list, args.precision, args.seed, args.window_length)
89
+ else:
90
+ import json
91
+ with open(args.json_path, "r") as file:
92
+ data = json.load(file)
93
+
94
+ combinations = data["combinations"]
95
+
96
+ for combo in combinations:
97
+ main(args.device, args.model_path, save_dir, combo['id_prompt'], combo['frame_prompt_list'], args.precision, args.seed, args.window_length)
requirements.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ pytorch transformers diffusers
resource/__init__.py ADDED
File without changes
resource/consistory+.yaml ADDED
@@ -0,0 +1,1894 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ animals:
2
+ - concept_token: phoenix
3
+ settings:
4
+ - rising from a fiery ashes
5
+ - soaring through a glowing sky
6
+ - perching on a mountain peak
7
+ - singing a haunting melody
8
+ - igniting flames with its wings
9
+ style: A fiery and majestic illustration of
10
+ subject: A phoenix with bright orange feathers
11
+ - concept_token: zebra
12
+ settings:
13
+ - grazing alongside a river
14
+ - running in a herd across the plains
15
+ - resting under the shade of an acacia tree
16
+ - crossing a dusty path in the wild
17
+ - protecting its young in the savannah
18
+ style: A vibrant and striking portrait of
19
+ subject: A zebra with black and white stripes
20
+ - concept_token: cheetah
21
+ settings:
22
+ - sprinting across the savannah
23
+ - stalking a gazelle in the grass
24
+ - relaxing in the shade under a tree
25
+ - marking territory with its scent
26
+ - napping after a successful hunt
27
+ style: A sleek and fast depiction of
28
+ subject: A cheetah with sharp eyes
29
+ - concept_token: walrus
30
+ settings:
31
+ - lounging on an ice floe
32
+ - diving for clams in the sea
33
+ - bellowing from the shore
34
+ - resting near a snowy coastline
35
+ - swimming gracefully in the ocean
36
+ style: A massive and majestic depiction of
37
+ subject: A walrus with large tusks
38
+ - concept_token: zebra
39
+ settings:
40
+ - grazing under the sun in the savannah
41
+ - running with a herd through the grasslands
42
+ - standing near a watering hole
43
+ - crossing a river in the wild
44
+ - trotting beside a wildebeest
45
+ style: A bold and striking illustration of
46
+ subject: A zebra with black and white stripes
47
+ - concept_token: cheetah
48
+ settings:
49
+ - sprinting at top speed in the savannah
50
+ - resting in the shade of a tree
51
+ - hunting a gazelle in tall grass
52
+ - stretching before a sprint
53
+ - watching the horizon from a hill
54
+ style: A sleek and fast portrayal of
55
+ subject: A cheetah with spotted fur
56
+ - concept_token: gorilla
57
+ settings:
58
+ - sitting in a jungle clearing
59
+ - playing with a baby gorilla
60
+ - searching for fruit in the trees
61
+ - relaxing by a waterfall
62
+ - beating its chest to assert dominance
63
+ style: A strong and wise illustration of
64
+ subject: A gorilla with dark fur
65
+ - concept_token: dog
66
+ settings:
67
+ - chasing a frisbee
68
+ - dressed in a raincoat
69
+ - in a city alley
70
+ - jumping over a puddle
71
+ - at a veterinarian's office
72
+ - playing in a park
73
+ - sleeping on a couch
74
+ - running on a beach
75
+ style: A hyper-realistic digital painting of
76
+ subject: A dog
77
+ - concept_token: parrot
78
+ settings:
79
+ - in a tropical rainforest
80
+ - singing a song
81
+ - on the water
82
+ - sitting on a rock
83
+ - in a rain storm
84
+ - perched on a branch
85
+ - eating fruit
86
+ style: A watercolor illustration of
87
+ subject: A Scarlet parrot with vibrant red, yellow, and blue feathers
88
+ - concept_token: dog
89
+ settings:
90
+ - wearing a bandana
91
+ - on a beach
92
+ - in a city alley
93
+ - in a snowy backyard
94
+ - at a veterinarian's office
95
+ - playing with a toy
96
+ - barking at a squirrel
97
+ style: A 3D animation of
98
+ subject: A black and white dog with yellow collar
99
+ - concept_token: fox
100
+ settings:
101
+ - wearing a scarf
102
+ - dressed in a warm coat
103
+ - wearing a playful bow
104
+ - resting in a den
105
+ - near a campsite
106
+ - exploring a forest
107
+ - looking at the stars
108
+ style: A hyper-realistic digital painting of
109
+ subject: A red fox with a vibrant red coat, white belly, and bushy tail
110
+ - concept_token: puppy
111
+ settings:
112
+ - in a pet store
113
+ - eating his food
114
+ - wearing a training harness
115
+ - dressed in a bandana
116
+ - at a lake
117
+ - playing with a ball
118
+ - sleeping in a bed
119
+ style: A 3D animation of
120
+ subject: A cute Labrador puppy with a glossy, chocolate brown coat
121
+ - concept_token: dog
122
+ settings:
123
+ - wearing a bandana
124
+ - biting a bone
125
+ - wearing a birthday hat
126
+ - in a snowy backyard
127
+ - sitting by a fireplace
128
+ style: A 3D animation of
129
+ subject: A black and white dog with yellow collar
130
+ - concept_token: kitten
131
+ settings:
132
+ - in a garden
133
+ - dressed in a cute sweater
134
+ - wearing a collar with a bell
135
+ - dressed in a superhero cape
136
+ - running through a field
137
+ - playing with a toy
138
+ - sitting in a basket
139
+ style: A watercolor illustration of
140
+ subject: A cute kitten with sleek, cream-colored fur and striking blue eyes
141
+ - concept_token: puppy
142
+ settings:
143
+ - in a pet store
144
+ - in a grassy yard
145
+ - wearing a training harness
146
+ - dressed in a bandana
147
+ - wearing a life vest
148
+ style: ''
149
+ subject: A puppy
150
+ - concept_token: puppy
151
+ settings:
152
+ - wearing a small sweater
153
+ - digging a hole
154
+ - wearing a training harness
155
+ - sticking head out of the car window
156
+ - swimming
157
+ - playing in the yard
158
+ - chasing a ball
159
+ style: A watercolor illustration of
160
+ subject: A puppy
161
+ - concept_token: cat
162
+ settings:
163
+ - playing with a yarn ball
164
+ - sleeping in a box
165
+ - wearing a fluffy collar
166
+ - climbing a tree
167
+ - sitting on a shelf
168
+ style: A hyper-realistic digital painting of
169
+ subject: A cat
170
+ - concept_token: cat
171
+ settings:
172
+ - wearing a bow tie
173
+ - dressed in a Halloween costume
174
+ - in a busy alley
175
+ - on a sled
176
+ - wearing a small bell
177
+ style: A hyper-realistic digital painting of
178
+ subject: A cat
179
+ - concept_token: kitten
180
+ settings:
181
+ - wearing a tiny hat
182
+ - on a couch
183
+ - in a hospital
184
+ - playing with a feather toy
185
+ - on a rooftop
186
+ style: ''
187
+ subject: A cute kitten with sleek, cream-colored fur and striking blue eyes
188
+ - concept_token: hedgehog
189
+ settings:
190
+ - in a cozy nest
191
+ - dressed in a miniature jacket
192
+ - wearing a small collar
193
+ - dressed in a festive outfit
194
+ - wearing a flower crown
195
+ style: A 3D animation of
196
+ subject: A happy hedgehog
197
+ - concept_token: horse
198
+ settings:
199
+ - in a stable
200
+ - jumping over a hurdle
201
+ - in a snowy field
202
+ - in a mountain trail
203
+ - in a busy street
204
+ style: A hyper-realistic digital painting of
205
+ subject: A Palomino horse with a golden coat and a flowing, white mane and tail
206
+ fairy_tales:
207
+ - concept_token: fairy
208
+ settings:
209
+ - fluttering among glowing fireflies
210
+ - sprinkling pixie dust in the air
211
+ - hiding inside a blooming flower
212
+ - weaving spells under an ancient tree
213
+ - guiding travelers with a magical lantern
214
+ style: A magical drawing of
215
+ subject: A delicate fairy with sparkling wings
216
+ - concept_token: princess
217
+ settings:
218
+ - walking in a garden of enchanted roses
219
+ - gazing at the stars from a castle balcony
220
+ - singing to woodland animals in a forest
221
+ - wearing a shimmering gown at a royal ball
222
+ - seeking wisdom from an old magical mirror
223
+ style: A dreamy illustration of
224
+ subject: A beautiful princess with a kind smile
225
+ - concept_token: troll
226
+ settings:
227
+ - lurking beneath an old stone bridge
228
+ - guarding a cave filled with treasures
229
+ - chasing intruders through dark woods
230
+ - sharpening a massive club near a fire
231
+ - watching over a mountain pass silently
232
+ style: A dark fantasy depiction of
233
+ subject: A menacing troll with rough skin
234
+ - concept_token: witch
235
+ settings:
236
+ - stirring a bubbling potion in a cauldron
237
+ - flying on a broomstick under a full moon
238
+ - casting spells in a candlelit cabin
239
+ - gathering herbs in a misty forest
240
+ - reading incantations from an ancient book
241
+ style: A dark and enchanting artwork of
242
+ subject: A mysterious witch in flowing robes
243
+ - concept_token: phoenix
244
+ settings:
245
+ - rising from ashes in a fiery burst
246
+ - soaring through the golden clouds of dawn
247
+ - perched atop a flaming tree
248
+ - lighting the dark forest with its glow
249
+ - circling a distant volcano in flight
250
+ style: A vibrant fantasy drawing of
251
+ subject: A majestic phoenix with flaming wings
252
+ - concept_token: elf
253
+ settings:
254
+ - crafting a bow from enchanted wood
255
+ - guiding travelers through an ancient forest
256
+ - practicing archery under a silver moon
257
+ - guarding a hidden woodland village
258
+ - reading an ancient map by firelight
259
+ style: A detailed character design of
260
+ subject: A graceful elf with pointed ears
261
+ - concept_token: snow queen
262
+ settings:
263
+ - commanding a storm in an icy palace
264
+ - creating snowflakes with a wave of her hand
265
+ - watching over a frozen kingdom
266
+ - walking through a glittering ice cave
267
+ - sitting on a throne of frosted crystals
268
+ style: A majestic winter-themed artwork of
269
+ subject: A regal snow queen with icy beauty
270
+ - concept_token: goblin
271
+ settings:
272
+ - sneaking into a hidden treasure vault
273
+ - crafting traps in a dark cave
274
+ - trading stolen trinkets at a market
275
+ - hiding from sunlight under a tree
276
+ - scavenging through enchanted ruins
277
+ style: A mischievous fantasy depiction of
278
+ subject: A cunning goblin with sharp features
279
+ - concept_token: magic book
280
+ settings:
281
+ - floating in midair, pages turning
282
+ - glowing faintly in a candlelit room
283
+ - locked with an ornate golden clasp
284
+ - whispering spells to its reader
285
+ - "lying open on a wizard\u2019s desk"
286
+ style: A mystical object illustration of
287
+ subject: An ancient magic book filled with secrets
288
+ - concept_token: unicorn
289
+ settings:
290
+ - galloping through a dense, enchanted forest
291
+ - drinking from a crystalline stream
292
+ - standing in a meadow of blooming flowers
293
+ - beneath a rainbow in a serene valley
294
+ - resting in the shadow of ancient ruins
295
+ - among the clouds at dawn
296
+ style: A magical artwork of
297
+ subject: A unicorn with a gleaming silver horn
298
+ - concept_token: fairy
299
+ settings:
300
+ - hovering over a moonlit pond
301
+ - dancing on the petals of a giant flower
302
+ - hiding in the hollow of an ancient tree
303
+ - spreading fairy dust over a sleeping village
304
+ - sitting on a mushroom in a magical forest
305
+ - playing with fireflies at dusk
306
+ - weaving through a field of wildflowers
307
+ style: A whimsical painting of
308
+ subject: A delicate fairy with translucent wings
309
+ - concept_token: wizard
310
+ settings:
311
+ - in a tower filled with ancient tomes and artifacts
312
+ - casting a spell by the light of a full moon
313
+ - standing before a magical portal in the forest
314
+ - summoning a storm over a mountain peak
315
+ - writing runes in a dusty spellbook
316
+ - mixing potions in a dimly lit chamber
317
+ - consulting a crystal ball
318
+ style: A mystical illustration of
319
+ subject: A wise wizard with a long, flowing beard
320
+ - concept_token: griffin
321
+ settings:
322
+ - soaring over a sprawling desert
323
+ - perched on a high cliff watching the horizon
324
+ - standing guard over a hidden treasure
325
+ - flying above a lush green valley
326
+ - resting on a rocky outcrop at sunset
327
+ - gliding through the clouds in a clear sky
328
+ - hunting in a dense forest
329
+ style: A majestic painting of
330
+ subject: A majestic griffin with golden feathers
331
+ - concept_token: centaur
332
+ settings:
333
+ - galloping through a grassy plain at dawn
334
+ - patrolling the edge of a dense forest
335
+ - leading a group of warriors through a mountain pass
336
+ - beside a river under a twilight sky
337
+ - practicing archery in a secluded glade
338
+ - at the foot of a towering cliff
339
+ - under a sky filled with storm clouds
340
+ style: An adventurous illustration of
341
+ subject: A powerful centaur with a bow
342
+ - concept_token: nymph
343
+ settings:
344
+ - dancing on a moonlit pond
345
+ - singing beside a sparkling waterfall
346
+ - playing with fish in a crystal-clear stream
347
+ - resting among blooming lilies
348
+ - under a weeping willow at dawn
349
+ - in a serene glade surrounded by flowers
350
+ - floating in a tranquil lake
351
+ - gathering dew at the break of day
352
+ style: A serene painting of
353
+ subject: A delicate water nymph with flowing hair
354
+ - concept_token: troll
355
+ settings:
356
+ - under a stone bridge covered in ivy
357
+ - guarding a treasure chest in a dark cave
358
+ - helping travelers across a river
359
+ - sitting by a campfire in a foggy forest
360
+ - building a shelter from fallen logs
361
+ - fishing in a quiet stream at dusk
362
+ - carving runes into a rock
363
+ - resting under a large oak tree
364
+ style: A heartwarming illustration of
365
+ subject: A friendly troll with moss-covered skin
366
+ - concept_token: ogre
367
+ settings:
368
+ - living in a cozy cave in the forest
369
+ - fishing in a quiet lake under a cloudy sky
370
+ - helping lost travelers find their way
371
+ - gathering berries in a sunlit meadow
372
+ - sitting by a roaring campfire
373
+ - playing with forest animals in a glade
374
+ - building a shelter from fallen logs
375
+ - carrying a bundle of firewood through the forest
376
+ style: A friendly depiction of
377
+ subject: A gentle ogre with a broad smile
378
+ - concept_token: dwarf
379
+ settings:
380
+ - mining for gems in a glittering cave
381
+ - crafting weapons at a forge
382
+ - drinking ale in a bustling tavern
383
+ - exploring a network of underground tunnels
384
+ - climbing a steep mountain path
385
+ - gathering herbs in a forest clearing
386
+ - carving runes into stone tablets
387
+ - resting by a roaring fireplace
388
+ style: A robust painting of
389
+ subject: A sturdy dwarf with a thick beard
390
+ - concept_token: goblin
391
+ settings:
392
+ - sneaking through a dark market at midnight
393
+ - playing tricks on travelers in a village
394
+ - hiding in the shadows of a narrow alleyway
395
+ - stealing shiny objects from a merchant's stall
396
+ - scurrying through a dense forest at twilight
397
+ - climbing the walls of a deserted castle
398
+ - laughing by a flickering campfire
399
+ - running through an underground tunnel
400
+ style: A quirky artwork of
401
+ subject: A mischievous goblin with sharp features
402
+ - concept_token: witch
403
+ settings:
404
+ - brewing a potion in a bubbling cauldron
405
+ - casting a spell in a dark, enchanted forest
406
+ - flying on a broomstick under a full moon
407
+ - preparing charms in a cluttered cottage
408
+ - consulting an ancient grimoire by candlelight
409
+ - in a shadowy glade surrounded by glowing eyes
410
+ - watching over a bubbling potion in a misty cave
411
+ style: A mysterious painting of
412
+ subject: A cunning witch with a pointy hat
413
+ - concept_token: satyr
414
+ settings:
415
+ - dancing in a moonlit glade
416
+ - playing tunes beside a babbling brook
417
+ - hiding behind a tree in a sun-dappled forest
418
+ - leading a merry chase through a meadow
419
+ - resting on a boulder in a twilight grove
420
+ - frolicking among wildflowers
421
+ - drinking from a natural spring
422
+ - singing under the light of the full moon
423
+ style: A playful painting of
424
+ subject: A mischievous satyr with a reed pipe
425
+ - concept_token: dryad
426
+ settings:
427
+ - resting against an ancient oak tree
428
+ - walking through a sunlit grove
429
+ - dancing with the wind in a forest clearing
430
+ - watching over a grove of saplings
431
+ - playing with woodland creatures in the morning mist
432
+ - singing to the birds in the early dawn light
433
+ - weaving flowers into her hair
434
+ - resting in the shade of a grand tree
435
+ style: A naturalistic painting of
436
+ subject: A serene dryad with leafy hair
437
+ - concept_token: princess
438
+ settings:
439
+ - walking in a garden of enchanted roses
440
+ - gazing at the stars from a castle balcony
441
+ - singing to woodland animals in a forest
442
+ - wearing a shimmering gown at a royal ball
443
+ - seeking wisdom from an old magical mirror
444
+ style: A dreamy illustration of
445
+ subject: A beautiful princess
446
+ - concept_token: wizard
447
+ settings:
448
+ - casting spells in a mystical tower
449
+ - studying ancient tomes in a library
450
+ - summoning creatures from a cauldron
451
+ - battling a dragon in a dark forest
452
+ - creating potions under a full moon
453
+ style: A mystical painting of
454
+ subject: A wise wizard
455
+ - concept_token: witch
456
+ settings:
457
+ - stirring a potion in a bubbling cauldron
458
+ - flying on a broom under a full moon
459
+ - reading from a spellbook in a dark hut
460
+ - casting a curse in a misty forest
461
+ - collecting herbs in a hidden glade
462
+ style: A dark and eerie drawing of
463
+ subject: A mysterious witch
464
+ - concept_token: troll
465
+ settings:
466
+ - guarding a stone bridge in the mountains
467
+ - sitting by a campfire in the woods
468
+ - threatening travelers in a dark cave
469
+ - wandering through a foggy swamp
470
+ - grumbling in an underground lair
471
+ style: A rugged drawing of
472
+ subject: A grumpy troll
473
+ - concept_token: fairy godmother
474
+ settings:
475
+ - waving a wand to transform pumpkins
476
+ - granting wishes in a sparkling forest
477
+ - offering advice to a princess
478
+ - preparing magical gifts in a cozy home
479
+ - visiting a poor family with a blessing
480
+ style: A heartwarming illustration of
481
+ subject: A kind fairy godmother
482
+ - concept_token: ogre
483
+ settings:
484
+ - stomping through a village in a rage
485
+ - hiding in a dark cave
486
+ - eating a giant feast by a fire
487
+ - chasing heroes through the forest
488
+ - resting by a muddy river
489
+ style: A frightening drawing of
490
+ subject: A fearsome ogre
491
+ - concept_token: gnome
492
+ settings:
493
+ - working in a hidden underground garden
494
+ - sitting by a cozy fireplace with a pipe
495
+ - traveling with a mushroom cart
496
+ - tending to the plants in a forest
497
+ - playing tricks in a quiet village
498
+ style: A charming sketch of
499
+ subject: A friendly gnome
500
+ - concept_token: goblin
501
+ settings:
502
+ - sneaking through a dark alley
503
+ - working in a dank forge
504
+ - hoarding treasure in a hidden cave
505
+ - stealing from travelers in the night
506
+ - brewing potions in a dark hut
507
+ style: A sneaky illustration of
508
+ subject: A cunning goblin
509
+ - concept_token: faun
510
+ settings:
511
+ - playing a flute by a bubbling stream
512
+ - dancing in a circle of mushrooms
513
+ - guiding travelers through a magical forest
514
+ - leading a procession of woodland creatures
515
+ - sitting by a fire telling stories
516
+ style: A peaceful sketch of
517
+ subject: A playful faun
518
+ - concept_token: phoenix
519
+ settings:
520
+ - rising from the ashes in a burst of fire
521
+ - flying through a fiery sky
522
+ - resting on a burning tree
523
+ - appearing in a flash of flame
524
+ - spreading its wings over a burning city
525
+ style: A fiery illustration of
526
+ subject: A majestic phoenix
527
+ - concept_token: unicorn
528
+ settings:
529
+ - galloping through a flower field
530
+ - standing by a sparkling waterfall
531
+ - racing across a rainbow bridge
532
+ - protecting a sacred grove
533
+ - spreading magic in the forest
534
+ style: A colorful illustration of
535
+ subject: A graceful unicorn
536
+ - concept_token: witch
537
+ settings:
538
+ - flying on a broomstick under a full moon
539
+ - brewing a potion in a bubbling cauldron
540
+ - reading spells in an ancient book
541
+ - preparing a magical brew in a forest
542
+ - cursing a lost traveler in the woods
543
+ style: A dark drawing of
544
+ subject: A mysterious witch
545
+ - concept_token: elf
546
+ settings:
547
+ - singing songs of old in a moonlit grove
548
+ - crafting bows and arrows in a woodland cabin
549
+ - gathering herbs in an enchanted forest
550
+ - fighting in a battle with a gleaming sword
551
+ - celebrating with fellow elves by a fire
552
+ style: A serene painting of
553
+ subject: A wise elf
554
+ - concept_token: troll
555
+ settings:
556
+ - lurking under a bridge
557
+ - hiding in a cave at night
558
+ - eating mushrooms in a dark forest
559
+ - grumbling as it patrols the woods
560
+ - guarding a treasure chest
561
+ style: A grumpy drawing of
562
+ subject: A massive troll
563
+ - concept_token: siren
564
+ settings:
565
+ - singing to passing sailors
566
+ - playing an enchanted harp on a rocky shore
567
+ - luring ships with a beautiful melody
568
+ - hiding in the depths of the sea
569
+ - sitting on a cliff watching the sunset
570
+ style: A haunting painting of
571
+ subject: A seductive siren
572
+ - concept_token: pegasus
573
+ settings:
574
+ - soaring over a green valley
575
+ - flying through the clouds above a mountain range
576
+ - galloping through the sky with wings spread wide
577
+ - racing along the edge of the ocean
578
+ - resting on a soft cloud at dawn
579
+ style: A majestic painting of
580
+ subject: A graceful pegasus
581
+ - concept_token: gnome
582
+ settings:
583
+ - tinkering in an underground workshop
584
+ - working in a lush garden
585
+ - guarding the entrance to a secret cave
586
+ - sitting on a mushroom with a pipe
587
+ - creating magical trinkets for travelers
588
+ style: A cozy illustration of
589
+ subject: A friendly gnome
590
+ - concept_token: faun
591
+ settings:
592
+ - playing a flute by a gentle stream
593
+ - dancing among ancient trees
594
+ - gathering flowers for a wreath
595
+ - guiding lost travelers through the forest
596
+ - resting under a giant oak tree
597
+ style: A whimsical drawing of
598
+ subject: A playful faun
599
+ fantasy:
600
+ - concept_token: griffin
601
+ settings:
602
+ - soaring above a golden desert
603
+ - nesting on a mountain cliff
604
+ - hunting in a forest at dawn
605
+ - standing proudly in front of a castle
606
+ - flying through a stormy sky
607
+ style: A majestic and powerful illustration of
608
+ subject: A griffin with the body of a lion and the wings of an eagle
609
+ - concept_token: centaur
610
+ settings:
611
+ - running through an open meadow
612
+ - playing a tune on a flute
613
+ - practicing swordsmanship in a field
614
+ - sitting by a campfire at night
615
+ - galloping through a forest trail
616
+ style: A heroic nature illustration of
617
+ subject: A centaur with the body of a horse and the torso of a warrior
618
+ - concept_token: mermaid
619
+ settings:
620
+ - swimming in a coral reef
621
+ - basking on a sunlit rock
622
+ - combing her hair with a shell
623
+ - singing in an underwater cave
624
+ - playing with colorful fish
625
+ style: A dreamy underwater illustration of
626
+ subject: A beautiful mermaid with a shimmering tail
627
+ - concept_token: wizard
628
+ settings:
629
+ - casting spells in a tower
630
+ - reading ancient books in a library
631
+ - brewing potions in a dark cave
632
+ - summoning creatures in a circle
633
+ - wandering through a mystical forest
634
+ style: A mystical and powerful illustration of
635
+ subject: A wise wizard with a long, flowing beard
636
+ - concept_token: werewolf
637
+ settings:
638
+ - howling at the full moon
639
+ - prowling through a misty forest
640
+ - transforming in the light of the moon
641
+ - hunting in the dark woods
642
+ - running on all fours under a starry sky
643
+ style: A terrifying and wild illustration of
644
+ subject: A werewolf with glowing yellow eyes
645
+ - concept_token: ogre
646
+ settings:
647
+ - stomping through a muddy swamp
648
+ - sitting by a campfire roasting food
649
+ - swinging a massive club in battle
650
+ - chasing intruders in the forest
651
+ - resting in a cave with treasure
652
+ style: A brutish and strong illustration of
653
+ subject: A large ogre with green skin and a rough demeanor
654
+ - concept_token: siren
655
+ settings:
656
+ - singing on a rocky cliff
657
+ - luring sailors to their doom
658
+ - swimming in the deep ocean
659
+ - resting on a wave-kissed shore
660
+ - weaving spells with her voice
661
+ style: A haunting and alluring illustration of
662
+ subject: A siren with long flowing hair and a melodious voice
663
+ - concept_token: elf queen
664
+ settings:
665
+ - sitting on a crystal throne
666
+ - overseeing the elven kingdom
667
+ - speaking with ancient spirits
668
+ - leading her people into battle
669
+ - walking through a mystical forest
670
+ style: A regal and elegant illustration of
671
+ subject: An elf queen with a jeweled crown
672
+ - concept_token: hobbit
673
+ settings:
674
+ - lounging by a cozy fireplace
675
+ - planting flowers in a garden
676
+ - enjoying a feast at home
677
+ - walking through the rolling hills
678
+ - smoking a pipe under a tree
679
+ style: A peaceful and rustic illustration of
680
+ subject: A hobbit with large feet and a warm smile
681
+ - concept_token: mermaid
682
+ settings:
683
+ - swimming in a coral reef
684
+ - basking on a sunlit rock
685
+ - combing her hair with a shell
686
+ - singing in an underwater cave
687
+ - playing with colorful fish
688
+ style: A dreamy underwater illustration of
689
+ subject: A beautiful mermaid with a shimmering tail
690
+ - concept_token: goblin
691
+ settings:
692
+ - sneaking into a treasure cave
693
+ - trading trinkets at a market
694
+ - sharpening weapons in a lair
695
+ - arguing over a shiny object
696
+ - lurking in the shadows of ruins
697
+ style: A dark fantasy illustration of
698
+ subject: A mischievous goblin with sharp teeth
699
+ - concept_token: sorceress
700
+ settings:
701
+ - conjuring a glowing orb
702
+ - surrounded by enchanted flames
703
+ - chanting in a stone circle
704
+ - reading from an ancient tome
705
+ - standing by a bubbling cauldron
706
+ style: A captivating fantasy portrait of
707
+ subject: A powerful sorceress in elegant robes
708
+ - concept_token: dwarf
709
+ settings:
710
+ - forging weapons in a fiery forge
711
+ - mining deep underground
712
+ - standing guard by a stone door
713
+ - drinking from a golden goblet
714
+ - carrying a heavy battleaxe
715
+ style: A gritty fantasy artwork of
716
+ subject: A stout dwarf with a long beard
717
+ - concept_token: troll
718
+ settings:
719
+ - sitting on a mossy boulder
720
+ - guarding a bridge in the woods
721
+ - clutching a massive club
722
+ - chasing intruders from a cave
723
+ - growling in the moonlight
724
+ style: A dark and menacing depiction of
725
+ subject: A hulking troll with a fearsome appearance
726
+ - concept_token: fairy
727
+ settings:
728
+ - wearing gossamer wings
729
+ - frolicking in the air
730
+ - wearing a crown of twigs
731
+ - near a shimmering brook
732
+ - dancing in moonlight
733
+ - in an enchanted forest
734
+ - sitting on a mushroom
735
+ style: A watercolor illustration of
736
+ subject: A fairy
737
+ - concept_token: elf
738
+ settings:
739
+ - in an elven city of trees
740
+ - cleaning a sword
741
+ - reading an ancient text
742
+ - flying in the air
743
+ - in a library of scrolls
744
+ - meditating by a waterfall
745
+ - at a council meeting
746
+ style: A photo of
747
+ subject: An elf with long, silver hair, eyes like polished amethyst, and ears that
748
+ curve elegantly upwards
749
+ - concept_token: unicorn
750
+ settings:
751
+ - prancing near the water
752
+ - beside a crystal lake
753
+ - wearing a silver bridle
754
+ - under a canopy of stars
755
+ - wearing a harness of sunbeams
756
+ - in a magical forest
757
+ style: A black and white sketch of
758
+ subject: A unicorn
759
+ - concept_token: goblin
760
+ settings:
761
+ - wearing a cloak
762
+ - haggling over goods
763
+ - wearing a belt of tools
764
+ - dressed in a guard's uniform
765
+ - in a dungeon's depths
766
+ - scavenging for treasures
767
+ style: A 3D animation of
768
+ subject: A goblin with patchy, leathery skin and oversized ears, carrying a tiny,
769
+ glowing lantern
770
+ - concept_token: fairy
771
+ settings:
772
+ - in a mystical glen
773
+ - atop a dew-covered flower
774
+ - inside a hollowed-out tree
775
+ - collecting morning dew
776
+ - under a full moon
777
+ - dancing in the rain
778
+ style: A hyper-realistic digital painting of
779
+ subject: A fairy
780
+ - concept_token: leprechaun
781
+ settings:
782
+ - counting gold coins
783
+ - at the end of a rainbow
784
+ - setting a trap
785
+ - in an underground workshop
786
+ - performing a jig
787
+ - hiding in the shadows
788
+ style: A hyper-realistic digital painting of
789
+ subject: A leprechaun with a mischievous grin, emerald-green coat, and a hat adorned
790
+ with a four-leaf clover
791
+ - concept_token: fairy
792
+ settings:
793
+ - dressed in a cloak of spider silk
794
+ - tending to forest creatures
795
+ - in a mystical glen
796
+ - frolicking in the air
797
+ - wearing a garland of fireflies
798
+ - sleeping in a flower
799
+ style: A hyper-realistic digital painting of
800
+ subject: A fairy
801
+ - concept_token: leprechaun
802
+ settings:
803
+ - in a clover field
804
+ - mending a shoe
805
+ - inside a hollow oak tree
806
+ - sharing a pint
807
+ - at a village fair
808
+ - hiding a pot of gold
809
+ style: ''
810
+ subject: A leprechaun
811
+ foods:
812
+ - concept_token: loaf of bread
813
+ settings:
814
+ - cooling on a rustic wooden board
815
+ - sliced and served with butter
816
+ - displayed at a bustling farmer's market
817
+ - resting in a woven basket
818
+ - paired with a bowl of hearty soup
819
+ style: A warm countryside painting of
820
+ subject: A crusty loaf of freshly baked bread
821
+ - concept_token: spaghetti
822
+ settings:
823
+ - topped with a rich tomato sauce
824
+ - sprinkled with grated Parmesan cheese
825
+ - served with garlic bread on the side
826
+ - twirled on a fork over a plate
827
+ - accompanied by a glass of red wine
828
+ style: An Italian-inspired artwork of
829
+ subject: A steaming plate of spaghetti with marinara sauce
830
+ - concept_token: croissant
831
+ settings:
832
+ - displayed in a Parisian bakery
833
+ - paired with a cup of espresso
834
+ - placed on a white porcelain plate
835
+ - in a basket lined with cloth
836
+ - "enjoyed at a sunny caf\xE9 table"
837
+ style: A soft and elegant painting of
838
+ subject: A buttery, flaky croissant
839
+ - concept_token: salad
840
+ settings:
841
+ - tossed with fresh greens and vegetables
842
+ - topped with a tangy vinaigrette
843
+ - served in a wooden bowl
844
+ - garnished with sliced avocado
845
+ - paired with a crusty bread roll
846
+ style: A fresh and vibrant painting of
847
+ subject: A garden-fresh salad full of color
848
+ - concept_token: macaron
849
+ settings:
850
+ - arranged delicately on a porcelain plate
851
+ - displayed in a Parisian patisserie
852
+ - served with a cup of tea
853
+ - stacked in a rainbow of colors
854
+ - wrapped in a gift box with ribbons
855
+ style: A refined and pastel-hued painting of
856
+ subject: A colorful array of delicate macarons
857
+ - concept_token: hot chocolate
858
+ settings:
859
+ - topped with whipped cream and cocoa
860
+ - served in a cozy mug by the fire
861
+ - accompanied by marshmallows on the side
862
+ - in a festive holiday-themed cup
863
+ - placed on a wooden tray with cookies
864
+ style: A warm and comforting painting of
865
+ subject: A steaming cup of hot chocolate
866
+ - concept_token: fried rice
867
+ settings:
868
+ - served in a traditional wok
869
+ - topped with green onions and egg
870
+ - accompanied by soy sauce on the side
871
+ - enjoyed at a bustling street market
872
+ - paired with a cup of jasmine tea
873
+ style: A vibrant Asian-inspired painting of
874
+ subject: A plate of colorful fried rice
875
+ - concept_token: cupcake
876
+ settings:
877
+ - topped with swirls of buttercream
878
+ - sprinkled with edible glitter
879
+ - placed on a decorative stand
880
+ - served at a birthday celebration
881
+ - paired with a glass of milk
882
+ style: A fun and festive painting of
883
+ subject: A perfectly decorated cupcake
884
+ - concept_token: bagel
885
+ settings:
886
+ - topped with cream cheese and lox
887
+ - "displayed in a cozy caf\xE9"
888
+ - served with a side of fresh fruit
889
+ - toasted to a golden brown
890
+ - wrapped for a quick breakfast
891
+ style: A simple and inviting painting of
892
+ subject: A classic bagel with delicious toppings
893
+ - concept_token: chocolate cake
894
+ settings:
895
+ - displayed in a high-end bakery window
896
+ - served at a birthday celebration
897
+ - enjoyed with a glass of red wine
898
+ - highlighted in a dessert cookbook
899
+ - paired with a scoop of ice cream
900
+ - presented at a wedding reception
901
+ - enjoyed during a romantic dinner
902
+ style: A digital illustration of
903
+ subject: A decadent chocolate cake with layers of rich ganache and fresh strawberries
904
+ - concept_token: bowl of ramen
905
+ settings:
906
+ - served in a bustling ramen shop
907
+ - highlighted in a food blog
908
+ - enjoyed on a chilly evening
909
+ - paired with a side of gyoza
910
+ - featured in a travel documentary
911
+ - prepared by a master chef
912
+ - showcased at a food festival
913
+ - enjoyed at a street food stall
914
+ style: A ukiyo-e style woodblock print of
915
+ subject: A steaming bowl of ramen with tender pork slices, soft-boiled egg, and
916
+ vibrant vegetables
917
+ - concept_token: plate of pasta
918
+ settings:
919
+ - served at an Italian trattoria
920
+ - enjoyed during a family dinner
921
+ - highlighted in a cooking show
922
+ - paired with a glass of red wine
923
+ - featured in a gourmet magazine
924
+ - prepared by a nonna
925
+ - showcased at a food festival
926
+ - enjoyed in a rustic kitchen
927
+ style: A realistic still life painting of
928
+ subject: A plate of pasta with homemade marinara sauce, fresh basil, and parmesan
929
+ cheese
930
+ - concept_token: ice cream sundae
931
+ settings:
932
+ - served at a vintage ice cream parlor
933
+ - enjoyed during a summer fair
934
+ - highlighted in a dessert cookbook
935
+ - paired with a brownie
936
+ - featured in a children's party
937
+ - prepared for a special treat
938
+ - displayed in a colorful bowl
939
+ style: A whimsical illustration of
940
+ subject: An indulgent ice cream sundae with chocolate syrup, whipped cream, and
941
+ a cherry on top
942
+ - concept_token: Caesar salad
943
+ settings:
944
+ - served at an upscale bistro
945
+ - enjoyed during a light lunch
946
+ - highlighted in a health food magazine
947
+ - paired with grilled chicken
948
+ - featured in a cooking class
949
+ - prepared for a summer picnic
950
+ style: A botanical illustration of
951
+ subject: A classic Caesar salad with crisp romaine, parmesan, and creamy dressing
952
+ - concept_token: bowl of gazpacho
953
+ settings:
954
+ - served at a Spanish tapas bar
955
+ - enjoyed during a summer evening
956
+ - highlighted in a culinary travel show
957
+ - paired with crusty bread
958
+ - featured in a Mediterranean cookbook
959
+ - prepared by a home chef
960
+ - showcased at a garden party
961
+ - enjoyed on a sunny patio
962
+ style: A cubist painting of
963
+ subject: A refreshing bowl of gazpacho with fresh tomatoes, cucumbers, and a drizzle
964
+ of olive oil
965
+ - concept_token: cheesecake
966
+ settings:
967
+ - served at a New York deli
968
+ - enjoyed during a special celebration
969
+ - highlighted in a dessert cookbook
970
+ - paired with a glass of dessert wine
971
+ - featured in a pastry shop
972
+ - prepared for a holiday gathering
973
+ style: A detailed ink drawing of
974
+ subject: A creamy cheesecake with a graham cracker crust and a raspberry swirl
975
+ - concept_token: bowl of chili
976
+ settings:
977
+ - served at a chili cook-off
978
+ - enjoyed during a cold winter day
979
+ - highlighted in a comfort food magazine
980
+ - paired with cornbread
981
+ - featured in a family recipe book
982
+ - prepared for a game day party
983
+ - showcased at a food truck
984
+ style: A Western-style painting of
985
+ subject: A hearty bowl of chili with ground beef, beans, and a spicy tomato sauce
986
+ - concept_token: lobster roll
987
+ settings:
988
+ - served at a seaside shack
989
+ - enjoyed during a summer vacation
990
+ - highlighted in a seafood magazine
991
+ - paired with a crisp white wine
992
+ - featured in a coastal restaurant
993
+ - prepared for a beach picnic
994
+ style: A nautical-themed illustration of
995
+ subject: A succulent lobster roll with tender lobster meat and a buttery bun
996
+ - concept_token: bowl of poke
997
+ settings:
998
+ - served at a Hawaiian luau
999
+ - enjoyed during a beach day
1000
+ - highlighted in a health food magazine
1001
+ - paired with a tropical smoothie
1002
+ - featured in a poke bar
1003
+ - prepared for a summer party
1004
+ - showcased at a food festival
1005
+ style: A vibrant watercolor of
1006
+ subject: A fresh bowl of poke with marinated fish, avocado, and rice
1007
+ - concept_token: bowl of pad thai
1008
+ settings:
1009
+ - served at a Thai street food stall
1010
+ - enjoyed during a warm evening
1011
+ - highlighted in a travel documentary
1012
+ - paired with a cold beer
1013
+ - featured in an Asian cookbook
1014
+ - prepared by a local chef
1015
+ - showcased at a food market
1016
+ - enjoyed during a festival
1017
+ style: A traditional Thai mural of
1018
+ subject: A flavorful bowl of pad thai with shrimp, peanuts, and fresh lime
1019
+ humans:
1020
+ - concept_token: scientist
1021
+ settings:
1022
+ - conducting an experiment in a lab
1023
+ - analyzing data on a computer
1024
+ - presenting research at a conference
1025
+ - observing through a microscope
1026
+ - writing a research paper
1027
+ style: A focused scene of
1028
+ subject: A scientist with a lab coat
1029
+ - concept_token: lawyer
1030
+ settings:
1031
+ - defending a client in court
1032
+ - negotiating a contract in an office
1033
+ - reviewing legal documents
1034
+ - advising clients on legal matters
1035
+ - preparing for a trial
1036
+ style: A professional scene of
1037
+ subject: A lawyer in a suit
1038
+ - concept_token: waiter
1039
+ settings:
1040
+ - serving food to customers in a restaurant
1041
+ - taking an order at a table
1042
+ - cleaning a table after customers leave
1043
+ - delivering drinks to a bar
1044
+ - setting a table for dinner
1045
+ style: A welcoming scene of
1046
+ subject: A waiter in a uniform
1047
+ - concept_token: tailor
1048
+ settings:
1049
+ - measuring a client for a suit
1050
+ - stitching a dress by hand
1051
+ - fitting a shirt for alterations
1052
+ - ironing clothes in a workshop
1053
+ - arranging fabrics on a cutting table
1054
+ style: A precise scene of
1055
+ subject: A tailor with a sewing machine
1056
+ - concept_token: scientist
1057
+ settings:
1058
+ - conducting research in a lab
1059
+ - analyzing samples under a microscope
1060
+ - experimenting with chemicals
1061
+ - presenting findings at a conference
1062
+ - writing a research paper
1063
+ style: A focused scene of
1064
+ subject: A scientist with a lab coat
1065
+ - concept_token: barber
1066
+ settings:
1067
+ - cutting hair in a barbershop
1068
+ - trimming a beard with scissors
1069
+ - styling a client's hair
1070
+ - shaving a client with a razor
1071
+ - cleaning up the work area
1072
+ style: A stylish scene of
1073
+ subject: A barber with clippers
1074
+ - concept_token: dancer
1075
+ settings:
1076
+ - performing ballet on stage
1077
+ - practicing in a dance studio
1078
+ - rehearsing for a performance
1079
+ - dancing at a wedding
1080
+ - choreographing a new routine
1081
+ style: A graceful scene of
1082
+ subject: A dancer in a tutu
1083
+ - concept_token: dentist
1084
+ settings:
1085
+ - cleaning teeth in a dental office
1086
+ - performing a root canal procedure
1087
+ - explaining oral hygiene to a patient
1088
+ - taking X-rays of teeth
1089
+ - filling a cavity during an appointment
1090
+ style: A medical scene of
1091
+ subject: A dentist in scrubs
1092
+ - concept_token: lawyer
1093
+ settings:
1094
+ - cross-examining a witness in court
1095
+ - preparing a legal case with colleagues
1096
+ - advising clients on legal matters
1097
+ - filing legal documents in an office
1098
+ - negotiating a settlement agreement
1099
+ style: A professional scene of
1100
+ subject: A lawyer in a suit
1101
+ - concept_token: optometrist
1102
+ settings:
1103
+ - conducting an eye exam with a patient
1104
+ - fitting eyeglasses for a client
1105
+ - examining eye charts in a clinic
1106
+ - discussing eye health with a patient
1107
+ - testing vision with an automated device
1108
+ style: A precise scene of
1109
+ subject: An optometrist with a phoropter
1110
+ - concept_token: gardener
1111
+ settings:
1112
+ - planting flowers in a garden
1113
+ - trimming hedges in a park
1114
+ - watering plants in a greenhouse
1115
+ - harvesting vegetables in a field
1116
+ - arranging flowers in a basket
1117
+ style: A vibrant scene of
1118
+ subject: A gardener with gloves and a straw hat
1119
+ - concept_token: gentleman
1120
+ settings:
1121
+ - reading a newspaper
1122
+ - at a vintage car show
1123
+ - in a vineyard
1124
+ - swinging a golf club
1125
+ - listening to music
1126
+ - dining at a fancy restaurant
1127
+ style: A 3D animation of
1128
+ subject: An elderly gentleman with distinguished gray hair, a neatly trimmed mustache,
1129
+ and gentle blue eyes
1130
+ - concept_token: woman
1131
+ settings:
1132
+ - wearing a gardening apron
1133
+ - at a seaside pier
1134
+ - in a rose garden
1135
+ - in a cozy library
1136
+ - on a city street in the 1950s
1137
+ style: A pixar style illustration of
1138
+ subject: A woman with a slender figure, straight red hair, and freckles across the
1139
+ nose
1140
+ - concept_token: woman
1141
+ settings:
1142
+ - baking cookies
1143
+ - at a seaside pier
1144
+ - in a rose garden
1145
+ - dressed in a vintage 1950s outfit
1146
+ - wearing a knitted shawl
1147
+ - painting a landscape
1148
+ - walking a dog
1149
+ - "drinking coffee at a caf\xE9"
1150
+ - taking photographs
1151
+ - attending a fair
1152
+ style: A hyper-realistic digital painting of
1153
+ subject: A woman with a slender figure, straight red hair, and freckles across the
1154
+ nose
1155
+ - concept_token: gentleman
1156
+ settings:
1157
+ - in a classic study
1158
+ - at a vintage car show
1159
+ - wearing a vineyard owner's attire
1160
+ - on a golf course
1161
+ - on a mountain
1162
+ - walking with a cane
1163
+ - discussing art at a gallery
1164
+ - tending to a garden
1165
+ - having a leisurely walk
1166
+ - enjoying a sunset
1167
+ style: A pixel art depiction of
1168
+ subject: An elderly gentleman with distinguished gray hair, a neatly trimmed mustache,
1169
+ and gentle blue eyes
1170
+ - concept_token: girl
1171
+ settings:
1172
+ - on a playground
1173
+ - in a flower garden
1174
+ - building a sandcastle
1175
+ - reading a fairy tale book
1176
+ - in a city park with autumn leaves
1177
+ - drawing in a sketchbook
1178
+ - taking a walk with friends
1179
+ - shopping at a boutique
1180
+ style: A hyper-realistic digital painting of
1181
+ subject: a 16 years old girl with wavy chestnut hair, a slender frame, and soft
1182
+ brown eyes
1183
+ - concept_token: boy
1184
+ settings:
1185
+ - green knitted hat
1186
+ - at an urban skatepark
1187
+ - in a dense forest
1188
+ - dressed as an astronaut
1189
+ - digging with a trowel
1190
+ - climbing a tree
1191
+ - attending a science fair
1192
+ style: A hyper-realistic digital painting of
1193
+ subject: A teenage boy with short, spiky black hair, a slight build, and dark brown
1194
+ eyes
1195
+ - concept_token: child
1196
+ settings:
1197
+ - in a toy store
1198
+ - exploring an exhibit
1199
+ - in a backyard
1200
+ - dressed in a prince costume
1201
+ - in a treehouse
1202
+ - playing with a puppy
1203
+ - enjoying a carousel ride
1204
+ - building a fort
1205
+ style: A watercolor illustration of
1206
+ subject: A male child with a round face, short ginger hair, and curious, wide eyes
1207
+ - concept_token: woman
1208
+ settings:
1209
+ - in a vintage kitchen
1210
+ - dressed in a formal evening gown
1211
+ - in a rose garden
1212
+ - reading a novel
1213
+ - shopping at a market
1214
+ - visiting a museum
1215
+ - having a picnic in a park
1216
+ style: A photo of
1217
+ subject: A woman with a slender figure, straight red hair, and freckles across the
1218
+ nose
1219
+ - concept_token: woman
1220
+ settings:
1221
+ - baking a cake
1222
+ - dressed in a formal evening gown
1223
+ - wearing a chef's hat and apron
1224
+ - holding a box
1225
+ - in a snowy forest
1226
+ - attending a holiday party
1227
+ - singing at a concert
1228
+ style: A watercolor illustration of
1229
+ subject: A woman with a slender figure, straight red hair, and freckles across the
1230
+ nose
1231
+ - concept_token: girl
1232
+ settings:
1233
+ - wearing a ballet dress
1234
+ - in a flower garden
1235
+ - building a sandcastle
1236
+ - in a medieval castle courtyard
1237
+ - wearing a birthday party hat
1238
+ - dancing in a rainstorm
1239
+ - writing in a journal
1240
+ - practicing yoga
1241
+ style: A pixel art depiction of
1242
+ subject: a 16 years old girl with wavy chestnut hair, a slender frame, and soft
1243
+ brown eyes
1244
+ inanimate:
1245
+ - concept_token: cloud
1246
+ settings:
1247
+ - floating on a postcard
1248
+ - stuck on a book cover
1249
+ - decorating a nursery wall
1250
+ - on a birthday card
1251
+ - placed on a phone case
1252
+ style: A photo of
1253
+ subject: A sticker of a fluffy white cloud
1254
+ - concept_token: starfish
1255
+ settings:
1256
+ - on a beach-themed photo album
1257
+ - stuck on a notebook cover
1258
+ - placed on a bathroom mirror
1259
+ - decorating a summer tote bag
1260
+ - on a seaside souvenir
1261
+ style: A photo of
1262
+ subject: A sticker of a detailed starfish
1263
+ - concept_token: butterfly
1264
+ settings:
1265
+ - placed on a floral scarf
1266
+ - decorating a greeting card
1267
+ - stuck on a wind chime
1268
+ - on a flowerpot in the garden
1269
+ - pinned on a corkboard
1270
+ style: A photo of
1271
+ subject: A sticker of a colorful butterfly
1272
+ - concept_token: rainbow
1273
+ settings:
1274
+ - decorating a birthday banner
1275
+ - stuck on a car window
1276
+ - placed on a child's lunchbox
1277
+ - on a keychain for good luck
1278
+ - on a window decal
1279
+ style: A photo of
1280
+ subject: A sticker of a vibrant rainbow
1281
+ - concept_token: umbrella
1282
+ settings:
1283
+ - decorating a rainy day card
1284
+ - stuck on a luggage tag
1285
+ - placed on a beach towel
1286
+ - on a coffee cup sleeve
1287
+ - as a window decal
1288
+ style: A photo of
1289
+ subject: A sticker of a colorful umbrella
1290
+ - concept_token: apple
1291
+ settings:
1292
+ - decorating a school notebook
1293
+ - stuck on a kitchen fridge
1294
+ - on a picnic basket lid
1295
+ - on a lunchbox
1296
+ - placed in a fruit bowl
1297
+ style: A photo of
1298
+ subject: A sticker of a shiny red apple
1299
+ - concept_token: bird
1300
+ settings:
1301
+ - perched on a window sill
1302
+ - stuck on a travel mug
1303
+ - decorating a greeting card
1304
+ - placed on a beach towel
1305
+ - on a decorative pillow
1306
+ style: A photo of
1307
+ subject: A sticker of a colorful bird
1308
+ - concept_token: pin
1309
+ settings:
1310
+ - decorating a corkboard
1311
+ - on a jean jacket
1312
+ - attached to a notebook cover
1313
+ - on a travel map
1314
+ - stuck on a greeting card
1315
+ style: A photo of
1316
+ subject: A sticker of a colorful pin
1317
+ - concept_token: hamburger
1318
+ settings:
1319
+ - placed on a restaurant menu
1320
+ - decorating a food-themed poster
1321
+ - stuck on a lunchbox
1322
+ - on a picnic blanket
1323
+ - in a fast food tray
1324
+ style: A photo of
1325
+ subject: A sticker of a delicious hamburger
1326
+ - concept_token: sun
1327
+ settings:
1328
+ - on a tropical vacation suitcase
1329
+ - stuck on a lemonade stand
1330
+ - decorating a summer beach ball
1331
+ - on a travel-themed scrapbook
1332
+ - placed on a sunlit balcony door
1333
+ style: A photo of
1334
+ subject: A sticker of a smiling yellow sun
1335
+ - concept_token: balloon
1336
+ settings:
1337
+ - on a birthday party invitation
1338
+ - floating on a festival flyer
1339
+ - stuck to a carnival ticket
1340
+ - decorating a child's bedroom wall
1341
+ - on a helium tank label
1342
+ style: A photo of
1343
+ subject: A sticker of a colorful balloon
1344
+ - concept_token: guitar
1345
+ settings:
1346
+ - on a musician's travel case
1347
+ - stuck on a band poster
1348
+ - decorating a music notebook
1349
+ - placed on a record store bag
1350
+ - on a concert ticket stub
1351
+ style: A photo of
1352
+ subject: A sticker of an acoustic guitar
1353
+ - concept_token: cupcake
1354
+ settings:
1355
+ - on a bakery box
1356
+ - decorating a birthday invitation
1357
+ - stuck on a sweet recipe book
1358
+ - placed on a tea party favor
1359
+ - on a dessert menu
1360
+ style: A photo of
1361
+ subject: A sticker of a frosted cupcake
1362
+ - concept_token: dog
1363
+ settings:
1364
+ - on a student's laptop
1365
+ - amidst an action-packed skate session
1366
+ - adding character to urban signage
1367
+ - at a live music gig
1368
+ - at an international airport
1369
+ - in a bustling city center
1370
+ - on a colorful street mural
1371
+ style: A photo of
1372
+ subject: A sticker of a cute corgi dog
1373
+ - concept_token: sofa
1374
+ settings:
1375
+ - occupied by a sleeping cat
1376
+ - in a Victorian-style parlor
1377
+ - in an outdoor patio
1378
+ - in a bustling coffee shop
1379
+ - with a fleece blanket
1380
+ - in a modern living room
1381
+ - beside a cozy fireplace
1382
+ - in a chic urban loft
1383
+ style: A hyper-realistic digital painting of
1384
+ subject: A plush velvet sofa in a rich emerald green color with elegant, curved
1385
+ armrests
1386
+ - concept_token: dog
1387
+ settings:
1388
+ - on a student's laptop
1389
+ - on a skateboard in a park
1390
+ - adding character to urban signage
1391
+ - decorated with band logos
1392
+ - on a traveler's suitcase
1393
+ - in a crowded subway
1394
+ - on a street artist's sketchbook
1395
+ - at a bustling flea market
1396
+ style: A hyper-realistic digital painting of
1397
+ subject: A sticker of a cute corgi dog
1398
+ - concept_token: lamp
1399
+ settings:
1400
+ - with a classic green shade
1401
+ - lighting up a late-night study session
1402
+ - casting a warm glow in an elegant setting
1403
+ - on a table
1404
+ - on the outside
1405
+ - in a minimalist workspace
1406
+ - by a cozy reading nook
1407
+ style: A 3D animation of
1408
+ subject: A modern desk lamp with a metallic finish and an adjustable, minimalist
1409
+ design
1410
+ - concept_token: mug
1411
+ settings:
1412
+ - being filled with steaming coffee
1413
+ - sitting beside a laptop and notebooks
1414
+ - on an office desk filled with papers
1415
+ - on a potter's wheel being shaped
1416
+ - on a balcony overlooking a cityscape
1417
+ - in a cozy kitchen
1418
+ - by a morning newspaper
1419
+ style: A hyper-realistic digital painting of
1420
+ subject: A red coffee mug shaped like a ball
1421
+ nature:
1422
+ - concept_token: sunflower
1423
+ settings:
1424
+ - stretching toward the midday sun
1425
+ - in a farmer's field
1426
+ - blooming in a garden patch
1427
+ - surrounded by bees on a hot day
1428
+ - against a bright blue sky
1429
+ style: A vibrant oil painting of
1430
+ subject: A cheerful sunflower
1431
+ - concept_token: deer
1432
+ settings:
1433
+ - grazing in a serene forest glade
1434
+ - stepping lightly through the underbrush
1435
+ - silhouetted against a rising moon
1436
+ - resting in a grassy meadow
1437
+ - bounding through a woodland clearing
1438
+ style: A detailed pencil drawing of
1439
+ subject: A graceful deer
1440
+ - concept_token: rainbow
1441
+ settings:
1442
+ - arching over a misty waterfall
1443
+ - after a spring rainstorm
1444
+ - across a golden wheat field
1445
+ - over a peaceful valley
1446
+ - against a backdrop of dark clouds
1447
+ style: A vivid acrylic painting of
1448
+ subject: A vibrant rainbow
1449
+ - concept_token: pine
1450
+ settings:
1451
+ - swaying in the mountain breeze
1452
+ - standing tall in a snow-covered forest
1453
+ - casting long shadows at dusk
1454
+ - surrounded by mist in a quiet glade
1455
+ - framing a scenic lakeside view
1456
+ style: A serene pencil sketch of
1457
+ subject: A tall pine tree
1458
+ - concept_token: butterfly
1459
+ settings:
1460
+ - flitting among garden flowers
1461
+ - resting on a green leaf
1462
+ - against a clear blue sky
1463
+ - hovering over a brook
1464
+ - dancing in tall grass
1465
+ style: A watercolor illustration of
1466
+ subject: A vibrant butterfly with iridescent wings
1467
+ - concept_token: oak
1468
+ settings:
1469
+ - tall in an autumn forest
1470
+ - casting shadows at sunset
1471
+ - blanketed in fresh snow
1472
+ - in a field of wildflowers
1473
+ - against rolling hills
1474
+ style: A realist landscape painting of
1475
+ subject: A mighty oak tree with spreading branches
1476
+ - concept_token: eagle
1477
+ settings:
1478
+ - soaring high above a cliff
1479
+ - perched on a mountain peak
1480
+ - gliding over a wide valley
1481
+ - flying against a clear blue sky
1482
+ - swooping over a forest
1483
+ style: A majestic watercolor of
1484
+ subject: A powerful eagle
1485
+ - concept_token: lily
1486
+ settings:
1487
+ - blooming in a tranquil pond
1488
+ - placed in a ceramic vase
1489
+ - floating on a still lake
1490
+ - beside a peaceful riverbank
1491
+ - surrounded by green leaves
1492
+ style: A soft pastel illustration of
1493
+ subject: A fragrant lily flower
1494
+ - concept_token: mushroom
1495
+ settings:
1496
+ - growing on the forest floor
1497
+ - nestled among fallen leaves
1498
+ - after a light spring rain
1499
+ - near a mossy tree trunk
1500
+ - surrounded by ferns and flowers
1501
+ style: A delicate pen-and-ink drawing of
1502
+ subject: A forest mushroom
1503
+ - concept_token: butterfly
1504
+ settings:
1505
+ - resting on a vibrant daisy
1506
+ - fluttering around a lavender bush
1507
+ - on the edge of a garden gate
1508
+ - perched on a sunflower
1509
+ - dancing in the spring breeze
1510
+ style: A soft watercolor of
1511
+ subject: A delicate butterfly
1512
+ - concept_token: fern
1513
+ settings:
1514
+ - unfurling in a forest glade
1515
+ - growing at the base of a tree
1516
+ - flourishing along a shaded trail
1517
+ - carpeting a rocky cliffside
1518
+ - swaying in the summer breeze
1519
+ style: A naturalistic pencil drawing of
1520
+ subject: A lush fern
1521
+ - concept_token: owl
1522
+ settings:
1523
+ - perched on a tree branch at dusk
1524
+ - flying through a moonlit forest
1525
+ - watching from a hollowed-out tree
1526
+ - gliding over a misty field
1527
+ - hooting in the silent night
1528
+ style: A realistic watercolor of
1529
+ subject: A wise owl in flight
1530
+ - concept_token: lily pad
1531
+ settings:
1532
+ - floating in a serene pond
1533
+ - under a blooming lotus flower
1534
+ - surrounded by dragonflies
1535
+ - near the edge of a brook
1536
+ - reflecting the moonlight
1537
+ style: A tranquil painting of
1538
+ subject: A green lily pad
1539
+ - concept_token: glacier
1540
+ settings:
1541
+ - towering above icy waters
1542
+ - shimmering in the polar sun
1543
+ - calving into the ocean
1544
+ - surrounded by snowy peaks
1545
+ - glowing in the arctic twilight
1546
+ style: A dramatic oil painting of
1547
+ subject: A massive glacier
1548
+ - concept_token: daisy
1549
+ settings:
1550
+ - blooming in a summer meadow
1551
+ - in a vase on a windowsill
1552
+ - swaying in a gentle breeze
1553
+ - nestled in a grassy field
1554
+ - with dew drops at dawn
1555
+ style: A cheerful watercolor of
1556
+ subject: A white daisy flower
1557
+ - concept_token: coral
1558
+ settings:
1559
+ - growing in a vibrant reef
1560
+ - home to tiny fish
1561
+ - illuminated by sunlight underwater
1562
+ - swaying gently with the currents
1563
+ - surrounded by blue water
1564
+ style: A detailed underwater painting of
1565
+ subject: A colorful coral
1566
+ - concept_token: willow
1567
+ settings:
1568
+ - by a peaceful riverbank
1569
+ - swaying in a summer breeze
1570
+ - shading a grassy picnic spot
1571
+ - with branches touching the water
1572
+ - against a cloudy sky
1573
+ style: A serene landscape painting of
1574
+ subject: A weeping willow tree
1575
+ - concept_token: dragonfly
1576
+ settings:
1577
+ - hovering above a pond
1578
+ - resting on a reed
1579
+ - darting through tall grass
1580
+ - shimmering in the morning light
1581
+ - near a field of wildflowers
1582
+ style: A vibrant sketch of
1583
+ subject: A delicate dragonfly
1584
+ - concept_token: coconut tree
1585
+ settings:
1586
+ - on a white sandy beach
1587
+ - swaying in a tropical breeze
1588
+ - shading a beachside hut
1589
+ - with coconuts hanging high
1590
+ - against a turquoise sky
1591
+ style: A tropical watercolor of
1592
+ subject: A tall coconut tree
1593
+ - concept_token: moss
1594
+ settings:
1595
+ - covering a forest stone
1596
+ - growing on an old tree trunk
1597
+ - carpeting a woodland floor
1598
+ - near a babbling brook
1599
+ - glistening after a rainstorm
1600
+ style: A detailed pencil drawing of
1601
+ subject: A patch of soft green moss
1602
+ - concept_token: sunflower field
1603
+ settings:
1604
+ - stretching to the horizon
1605
+ - under a bright summer sky
1606
+ - swaying in the warm wind
1607
+ - alive with buzzing bees
1608
+ - glowing in the evening light
1609
+ style: A grand oil painting of
1610
+ subject: A vibrant sunflower field
1611
+ - concept_token: tide pool
1612
+ settings:
1613
+ - teeming with colorful sea life
1614
+ - reflecting the sky above
1615
+ - nestled in rocky shoreline
1616
+ - surrounded by barnacles and shells
1617
+ - touched by the rising tide
1618
+ style: A realistic painting of
1619
+ subject: A lively tide pool
1620
+ - concept_token: waterfall mist
1621
+ settings:
1622
+ - rising from a powerful cascade
1623
+ - drifting through a dense forest
1624
+ - catching sunlight in a rainbow
1625
+ - cooling a rocky riverbank
1626
+ - surrounding a hidden grotto
1627
+ style: A soft pastel painting of
1628
+ subject: Mist from a waterfall
1629
+ - concept_token: aurora
1630
+ settings:
1631
+ - dancing across a polar sky
1632
+ - reflected on an icy lake
1633
+ - illuminating a snowy landscape
1634
+ - against a mountain backdrop
1635
+ - over a quiet arctic village
1636
+ style: A celestial oil painting of
1637
+ subject: A shimmering aurora borealis
1638
+ - concept_token: hedgehog
1639
+ settings:
1640
+ - sniffing among fallen leaves
1641
+ - curled up in the grass
1642
+ - exploring a forest trail
1643
+ - hiding in a hollow log
1644
+ - beside a blooming flower
1645
+ style: A charming watercolor of
1646
+ subject: A tiny hedgehog
1647
+ - concept_token: oak leaf
1648
+ settings:
1649
+ - turning golden in autumn
1650
+ - floating down a gentle stream
1651
+ - crunching underfoot on a trail
1652
+ - resting on a park bench
1653
+ - caught in a spider's web
1654
+ style: A detailed pencil drawing of
1655
+ subject: A textured oak leaf
1656
+ - concept_token: stag
1657
+ settings:
1658
+ - standing proud on a hillside
1659
+ - silhouetted against the dawn
1660
+ - walking through a frosty meadow
1661
+ - under a canopy of stars
1662
+ - by a peaceful forest stream
1663
+ style: A majestic watercolor of
1664
+ subject: A noble stag
1665
+ - concept_token: seagull
1666
+ settings:
1667
+ - soaring above crashing waves
1668
+ - perched on a rocky outcrop
1669
+ - circling a fishing boat
1670
+ - calling out over the sea
1671
+ - walking along a sandy shore
1672
+ style: A lively watercolor of
1673
+ subject: A seagull in flight
1674
+ - concept_token: lavender
1675
+ settings:
1676
+ - blooming in a sunny field
1677
+ - filling a garden with fragrance
1678
+ - tied in bundles on a table
1679
+ - swaying in a gentle breeze
1680
+ - with bees buzzing around
1681
+ style: A calming watercolor of
1682
+ subject: A cluster of lavender flowers
1683
+ - concept_token: raven
1684
+ settings:
1685
+ - perched on a barren branch
1686
+ - cawing in the early morning mist
1687
+ - gliding over a misty valley
1688
+ - standing on a rocky outcrop
1689
+ - silhouetted against the moon
1690
+ style: A mysterious oil painting of
1691
+ subject: A jet-black raven
1692
+ - concept_token: oak
1693
+ settings:
1694
+ - tall in an autumn forest
1695
+ - casting shadows at sunset
1696
+ - blanketed in fresh snow
1697
+ - in a field of wildflowers
1698
+ - against rolling hills
1699
+ - shading a peaceful clearing
1700
+ - in a lush green woodland
1701
+ style: A realist landscape painting of
1702
+ subject: A mighty oak tree with spreading branches
1703
+ - concept_token: butterfly
1704
+ settings:
1705
+ - flitting among garden flowers
1706
+ - resting on a green leaf
1707
+ - against a clear blue sky
1708
+ - hovering over a brook
1709
+ - dancing in tall grass
1710
+ - with a rainbow backdrop
1711
+ - landing on a child's hand
1712
+ style: A watercolor illustration of
1713
+ subject: A vibrant butterfly with iridescent wings
1714
+ - concept_token: snow leopard
1715
+ settings:
1716
+ - prowling in snowy mountains
1717
+ - camouflaged on rocks
1718
+ - stalking prey in the snow
1719
+ - lying on a snowy ledge
1720
+ - with wind ruffling its fur
1721
+ - in winter morning light
1722
+ - with jagged peaks behind
1723
+ style: A photorealistic painting of
1724
+ subject: A snow leopard with piercing blue eyes
1725
+ - concept_token: waterfall
1726
+ settings:
1727
+ - in a tropical forest
1728
+ - under a sunny rainbow
1729
+ - surrounded by mist
1730
+ - sunlight through canopy
1731
+ - in a mountain valley
1732
+ - falling into a clear pool
1733
+ - in morning light
1734
+ style: A romantic landscape painting of
1735
+ subject: A majestic waterfall cascading down rocky cliffs
1736
+ - concept_token: peacock
1737
+ settings:
1738
+ - strutting in a garden
1739
+ - displaying feathers fully
1740
+ - walking a garden path
1741
+ - perched on a wall
1742
+ - amid blooming flowers
1743
+ - under a vine pergola
1744
+ - on an estate lawn
1745
+ style: A detailed Art Nouveau illustration of
1746
+ subject: A magnificent peacock with iridescent feathers
1747
+ - concept_token: hummingbird
1748
+ settings:
1749
+ - in a flower garden
1750
+ - amidst honeysuckles
1751
+ - under a clear sky
1752
+ - near a backyard feeder
1753
+ - in a rainforest clearing
1754
+ - with lush foliage behind
1755
+ - during golden hour
1756
+ style: A dynamic nature photograph of
1757
+ subject: A tiny hummingbird hovering near a flower
1758
+ - concept_token: maple leaf
1759
+ settings:
1760
+ - falling from a tree
1761
+ - on a leaf-covered path
1762
+ - in a forest glade
1763
+ - glowing in autumn light
1764
+ - lying on a mossy rock
1765
+ - under a clear sky
1766
+ - drifting on a pond
1767
+ style: A traditional Japanese ink wash painting of
1768
+ subject: A bright red maple leaf in autumn
1769
+ - concept_token: dolphin
1770
+ settings:
1771
+ - in a tropical sea
1772
+ - under midday sun
1773
+ - near coral reefs
1774
+ - beside a sunset boat
1775
+ - amidst playful dolphins
1776
+ - in open ocean waves
1777
+ - near a small island
1778
+ style: A vibrant marine watercolor of
1779
+ subject: A playful dolphin leaping from the water
1780
+ - concept_token: bamboo
1781
+ settings:
1782
+ - in a Japanese garden
1783
+ - swaying in the breeze
1784
+ - against misty mountains
1785
+ - sunlight through leaves
1786
+ - in a bamboo forest
1787
+ - beside a koi pond
1788
+ - in morning fog
1789
+ style: A traditional Chinese ink painting of
1790
+ subject: A cluster of tall, graceful bamboo stalks
1791
+ - concept_token: polar bear
1792
+ settings:
1793
+ - in a glowing sunset
1794
+ - on frozen tundra edge
1795
+ - with a cub nearby
1796
+ - among floating icebergs
1797
+ - under twilight skies
1798
+ - by snow-capped mountains
1799
+ - in an Arctic storm
1800
+ style: A serene Arctic oil painting of
1801
+ subject: A majestic polar bear on a sheet of ice
1802
+ - concept_token: coral
1803
+ settings:
1804
+ - under tropical waters
1805
+ - with darting fish
1806
+ - among marine flora
1807
+ - on sandy sea floor
1808
+ - sunlight through water
1809
+ - near an underwater cave
1810
+ - swaying with seaweed
1811
+ style: A bright and lively underwater painting of
1812
+ subject: A vibrant coral reef teeming with life
1813
+ technoledge:
1814
+ - concept_token: robotic dog
1815
+ settings:
1816
+ - delivering items in a city
1817
+ - guiding visually impaired people
1818
+ - assisting in search-and-rescue missions
1819
+ - playing fetch in a park
1820
+ - patrolling a secure facility
1821
+ style: A futuristic urban illustration of
1822
+ subject: A robotic dog with sleek metallic limbs
1823
+ - concept_token: robotic horse
1824
+ settings:
1825
+ - running in a racing competition
1826
+ - assisting farmers in fieldwork
1827
+ - carrying heavy loads in rugged terrain
1828
+ - patrolling a large estate
1829
+ - providing transportation in rural areas
1830
+ style: A dynamic outdoor illustration of
1831
+ subject: A robotic horse with metallic limbs
1832
+ - concept_token: robotic rabbit
1833
+ settings:
1834
+ - delivering flowers in a botanical garden
1835
+ - entertaining children in a playroom
1836
+ - exploring urban parks with its owner
1837
+ - assisting farmers in crop monitoring
1838
+ - navigating offices delivering documents
1839
+ style: A charming nature illustration of
1840
+ subject: A robotic rabbit with fluffy ears
1841
+ - concept_token: robotic wolf
1842
+ settings:
1843
+ - guarding the perimeter of a secure facility
1844
+ - tracking wildlife in a forest
1845
+ - assisting in mountain rescues
1846
+ - participating in a futuristic military exercise
1847
+ - leading a pack of robots in exploration
1848
+ style: A mysterious wilderness illustration of
1849
+ subject: A robotic wolf with glowing eyes
1850
+ - concept_token: robotic dog
1851
+ settings:
1852
+ - delivering items in a city
1853
+ - guiding visually impaired people
1854
+ - assisting in search-and-rescue missions
1855
+ - playing fetch in a park
1856
+ - patrolling a secure facility
1857
+ style: A futuristic urban illustration of
1858
+ subject: A robotic dog
1859
+ - concept_token: robotic cheetah
1860
+ settings:
1861
+ - running in a race
1862
+ - assisting in wildlife conservation
1863
+ - tracking criminals in a city
1864
+ - patrolling a security perimeter
1865
+ - helping athletes train for speed
1866
+ style: A high-speed action illustration of
1867
+ subject: A robotic cheetah
1868
+ - concept_token: delivery robot
1869
+ settings:
1870
+ - transporting packages in a city
1871
+ - running errands in a busy street
1872
+ - delivering groceries to homes
1873
+ - navigating a warehouse for parcels
1874
+ - moving goods in a market
1875
+ style: A compact delivery illustration of
1876
+ subject: A delivery robot
1877
+ - concept_token: robotic elephant
1878
+ settings:
1879
+ - carrying heavy materials in construction
1880
+ - assisting with tree planting
1881
+ - patrolling wildlife reserves
1882
+ - providing rides for tourists
1883
+ - transporting goods over rough terrain
1884
+ style: A majestic nature illustration of
1885
+ subject: A robotic elephant
1886
+ - concept_token: robotic wolf
1887
+ settings:
1888
+ - patrolling secure areas
1889
+ - assisting in wilderness exploration
1890
+ - tracking wildlife in forests
1891
+ - guarding perimeters in urban zones
1892
+ - leading a robotic pack
1893
+ style: A sleek nature illustration of
1894
+ subject: A robotic wolf
resource/example.json ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "combinations": [
3
+ {
4
+ "id_prompt": "A hyper-realistic digital painting of a 16 years old girl.",
5
+ "frame_prompt_list": [
6
+ "in a flower garden",
7
+ "building a sandcastle",
8
+ "in a city park with autumn leaves"
9
+ ]
10
+ },
11
+ {
12
+ "id_prompt": "A vintage-style poster of a dog",
13
+ "frame_prompt_list": [
14
+ "playing a guitar at a country concert",
15
+ "sitting by a campfire under a starry sky",
16
+ "riding a skateboard through a bustling city",
17
+ "posing in front of a historical landmark",
18
+ "wearing an astronaut suit on the moon"
19
+ ]
20
+ },
21
+ {
22
+ "id_prompt": "A photo of a dog",
23
+ "frame_prompt_list": [
24
+ "dancing to music at a vibrant street festival",
25
+ "chasing a frisbee in a colorful park",
26
+ "wearing sunglasses while relaxing on a beach chair",
27
+ "posing for a photoshoot in a modern art gallery",
28
+ "jumping through a hoop at a circus performance",
29
+ "playing with a group of children at a playground",
30
+ "exploring a retro diner while wearing a bowtie"
31
+ ]
32
+ },
33
+ {
34
+ "id_prompt": "A mystical illustration of a wise wizard with a long, flowing beard",
35
+ "frame_prompt_list": [
36
+ "in a tower filled with ancient tomes and artifacts",
37
+ "casting a spell by the light of a full moon",
38
+ "standing before a magical portal in the forest",
39
+ "summoning a storm over a mountain peak",
40
+ "writing runes in a dusty spellbook",
41
+ "mixing potions in a dimly lit chamber",
42
+ "consulting a crystal ball"
43
+ ]
44
+ },
45
+ {
46
+ "id_prompt": "A pixar style illustration of a dragon",
47
+ "frame_prompt_list": [
48
+ "soaring gracefully through a rainbow sky",
49
+ "nestled among blooming cherry blossoms",
50
+ "playfully splashing in a sparkling lake"
51
+ ]
52
+ },
53
+ {
54
+ "id_prompt": "A whimsical painting of a delicate fairy",
55
+ "frame_prompt_list": [
56
+ "hovering over a moonlit pond",
57
+ "dancing on the petals of a giant flower",
58
+ "spreading fairy dust over a sleeping village",
59
+ "sitting on a mushroom in a magical forest",
60
+ "playing with fireflies at dusk"
61
+ ]
62
+ },
63
+ {
64
+ "id_prompt": "A hyper-realistic digital painting of an elderly gentleman",
65
+ "frame_prompt_list": [
66
+ "wearing a smoking jacket",
67
+ "at a vintage car show",
68
+ "wearing a vineyard owner's attire",
69
+ "on a golf course",
70
+ "at a classical music concert",
71
+ "painting a landscape"
72
+ ]
73
+ },
74
+ {
75
+ "id_prompt": "A vintage-style poster of a ceramic vase with an intricate floral pattern and a glossy, sky-blue glaze",
76
+ "frame_prompt_list": [
77
+ "holding a rare bouquet of flowers",
78
+ "displaying exotic orchids",
79
+ "complementing a corporate decor",
80
+ "containing delicate cherry blossoms",
81
+ "holding a vibrant arrangement of sunflowers",
82
+ "filled with a fresh bouquet of lavender and wild daisies"
83
+ ]
84
+ },
85
+ {
86
+ "id_prompt": "A photo of a happy hedgehog with its cheese",
87
+ "frame_prompt_list": [
88
+ "in an autumn forest",
89
+ "next to a tiny cheese wheel",
90
+ "sitting on a mushroom",
91
+ "under a picnic blanket",
92
+ "amid blooming spring flowers"
93
+ ]
94
+ },
95
+ {
96
+ "id_prompt": "A heartwarming illustration of a friendly troll",
97
+ "frame_prompt_list": [
98
+ "under a stone bridge covered in ivy",
99
+ "guarding a treasure chest in a dark cave",
100
+ "helping travelers across a river",
101
+ "sitting by a campfire in a foggy forest",
102
+ "building a shelter from fallen logs",
103
+ "fishing in a quiet stream at dusk",
104
+ "carving runes into a rock",
105
+ "resting under a large oak tree"
106
+ ]
107
+ },
108
+ {
109
+ "id_prompt": "A quaint illustration of a hobbit",
110
+ "frame_prompt_list": [
111
+ "in a cozy, round door cottage",
112
+ "sitting by a fireplace in a quaint home",
113
+ "working in a garden of vibrant vegetables",
114
+ "enjoying a feast under a starlit sky",
115
+ "reading a book in a sunlit meadow",
116
+ "walking through a peaceful village",
117
+ "celebrating with friends in a rustic tavern",
118
+ "exploring a hidden valley"
119
+ ]
120
+ },
121
+ {
122
+ "id_prompt": "A hyper-realistic digital painting of a young ginger boy with his ball",
123
+ "frame_prompt_list": [
124
+ "leaves scattering in a gentle breeze",
125
+ "standing in a quiet meadow",
126
+ "set against a vibrant sunset",
127
+ "in a busy street of people",
128
+ "by a colorful graffiti wall",
129
+ "amidst a field of blooming wildflowers" ]
130
+ },
131
+ {
132
+ "id_prompt": "A cinematic portrait of a man and a woman standing together",
133
+ "frame_prompt_list": [
134
+ "under a sky full of stars",
135
+ "on a bustling city street at night",
136
+ "in a dimly lit jazz club",
137
+ "walking along a sandy beach at sunset",
138
+ "in a cozy coffee shop with large windows",
139
+ "in a vibrant art gallery surrounded by paintings",
140
+ "under an umbrella during a soft rain",
141
+ "on a quiet park bench amidst falling leaves",
142
+ "standing on a rooftop overlooking the city skyline"
143
+ ]
144
+ },
145
+ {
146
+ "id_prompt": "A cinematic portrait of a man, a woman, and a child",
147
+ "frame_prompt_list": [
148
+ "walking in a quiet park",
149
+ "under a starlit sky",
150
+ "by a rustic cabin",
151
+ "on a forest trail",
152
+ "by a peaceful lake",
153
+ "at a vibrant market",
154
+ "in a snowy street",
155
+ "by a carousel",
156
+ "on a picnic blanket"
157
+ ]
158
+ }
159
+ ]
160
+ }
resource/gen_benchmark.py ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import os
3
+ import yaml
4
+ from main import generate_images, load_unet_controller
5
+ from unet import utils
6
+ import torch
7
+ import queue
8
+ import threading
9
+ from tqdm import tqdm # Import tqdm
10
+
11
+ def main_ben(unet_controller, pipe, save_dir, id_prompt, frame_prompt_list, seed, window_length):
12
+ unet_controller.ipca_index = -1
13
+ unet_controller.ipca_time_step = -1
14
+ # Ensure each process uses its own assigned device
15
+ os.makedirs(save_dir, exist_ok=True)
16
+ images, story_image = generate_images(unet_controller, pipe, id_prompt, frame_prompt_list, save_dir, window_length, seed, verbose=False)
17
+ return images, story_image
18
+
19
+ def process_instance(unet_controller, pipe, instance):
20
+ # Unpack instance and execute task
21
+ save_dir, id_prompt, frame_prompt_list, seed, window_length = instance
22
+ return main_ben(unet_controller, pipe, save_dir, id_prompt, frame_prompt_list, seed, window_length)
23
+
24
+ def worker(device, unet_controller, pipe, task_queue, pbar):
25
+ # Process tasks until queue is empty
26
+ while not task_queue.empty():
27
+ instance = task_queue.get()
28
+ if instance is None: # If None is encountered, stop the worker
29
+ break
30
+ # Process the instance
31
+ result = process_instance(unet_controller, pipe, instance)
32
+ # Log the completion
33
+ print(f"Finished processing {instance[1]}") # Log the processed instance (id_prompt)
34
+ task_queue.task_done() # Mark the task as done
35
+ pbar.update(1) # Update the progress bar
36
+
37
+ def main():
38
+ parser = argparse.ArgumentParser(description="Calculate image similarities using DreamSim or CLIP.")
39
+ parser.add_argument('--device', type=str, choices=['cuda:0', 'cuda:1', 'cuda'], default='cuda')
40
+ parser.add_argument('--save_dir', type=str,)
41
+ parser.add_argument('--benchmark_path', type=str,)
42
+ parser.add_argument('--model_path', type=str, default='stabilityai/stable-diffusion-xl-base-1.0', help='Path to the model')
43
+ parser.add_argument('--precision', type=str, choices=["fp16", "fp32"], default="fp16", help='Model precision')
44
+ parser.add_argument('--window_length', type=int, default=10, help='Window length for story generation')
45
+ parser.add_argument('--num_gpus', type=int, default=2, help='Number of GPUs to use')
46
+ parser.add_argument('--fix_seed', type=int, default=42, help='-1 for random seed')
47
+ args = parser.parse_args()
48
+
49
+ # Create a list of devices
50
+ devices = [f'cuda:{i}' for i in range(args.num_gpus)] # List of device names
51
+ if args.num_gpus == 1:
52
+ devices = [args.device]
53
+
54
+ # Load unet_controllers and pipes for each device
55
+ unet_controllers = {}
56
+ pipes = {}
57
+ for device in devices:
58
+ pipe, _ = utils.load_pipe_from_path(args.model_path, device, torch.float16 if args.precision == "fp16" else torch.float32, args.precision)
59
+ unet_controller = load_unet_controller(pipe, device)
60
+ unet_controller.Save_story_image = False
61
+ unet_controller.Prompt_embeds_mode = "svr-eot"
62
+ # unet_controller.Is_freeu_enabled = True
63
+ unet_controllers[device] = unet_controller
64
+ pipes[device] = pipe
65
+
66
+ # Load the benchmark data
67
+ with open(os.path.expanduser(args.benchmark_path), 'r') as file:
68
+ data = yaml.safe_load(file)
69
+
70
+ instances = []
71
+ for subject_domain, subject_domain_instances in data.items():
72
+ for index, instance in enumerate(subject_domain_instances):
73
+ id_prompt = f'{instance["style"]} {instance["subject"]}'
74
+ frame_prompt_list = instance["settings"]
75
+ save_dir = os.path.join(args.save_dir, f"{subject_domain}_{index}")
76
+ if args.fix_seed != -1:
77
+ seed = args.fix_seed
78
+ else:
79
+ import random
80
+ seed = random.randint(0, 2**32 - 1)
81
+ instances.append((save_dir, id_prompt, frame_prompt_list, seed, args.window_length))
82
+
83
+ # Create a task queue and populate it with instances
84
+ task_queue = queue.Queue()
85
+ for instance in instances:
86
+ task_queue.put(instance)
87
+
88
+ # Initialize tqdm progress bar
89
+ pbar = tqdm(total=len(instances))
90
+
91
+ # Create threads for each device to process instances
92
+ threads = []
93
+ for device in devices:
94
+ unet_controller = unet_controllers[device]
95
+ pipe = pipes[device]
96
+ thread = threading.Thread(target=worker, args=(device, unet_controller, pipe, task_queue, pbar))
97
+ threads.append(thread)
98
+ thread.start()
99
+ import time
100
+ time.sleep(1) # Wait for 1 second before starting the next thread
101
+
102
+ # Wait for all threads to finish
103
+ for thread in threads:
104
+ thread.join()
105
+
106
+ # Close the progress bar
107
+ pbar.close()
108
+
109
+ if __name__ == "__main__":
110
+ main()
resource/photo.gif ADDED

Git LFS Details

  • SHA256: 2fdf2f318614a3315b3dd410be055daa77b8be0b2a3bc229784ea25fcc6e6d83
  • Pointer size: 133 Bytes
  • Size of remote file: 10 MB
unet/pipeline_stable_diffusion_xl.py ADDED
@@ -0,0 +1,1364 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2024 The HuggingFace Team. All rights reserved.
2
+ #
3
+ # Licensed under the Apache License, Version 2.0 (the "License");
4
+ # you may not use this file except in compliance with the License.
5
+ # You may obtain a copy of the License at
6
+ #
7
+ # http://www.apache.org/licenses/LICENSE-2.0
8
+ #
9
+ # Unless required by applicable law or agreed to in writing, software
10
+ # distributed under the License is distributed on an "AS IS" BASIS,
11
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12
+ # See the License for the specific language governing permissions and
13
+ # limitations under the License.
14
+
15
+ import inspect
16
+ from typing import Any, Callable, Dict, List, Optional, Tuple, Union
17
+ import torch.nn.functional as F
18
+ from torch.autograd import grad
19
+
20
+ import torch
21
+ from transformers import (
22
+ CLIPImageProcessor,
23
+ CLIPTextModel,
24
+ CLIPTextModelWithProjection,
25
+ CLIPTokenizer,
26
+ CLIPVisionModelWithProjection,
27
+ )
28
+
29
+ from diffusers.image_processor import PipelineImageInput, VaeImageProcessor
30
+ from diffusers.loaders import (
31
+ FromSingleFileMixin,
32
+ IPAdapterMixin,
33
+ StableDiffusionXLLoraLoaderMixin,
34
+ TextualInversionLoaderMixin,
35
+ )
36
+ from diffusers.models import AutoencoderKL, ImageProjection, UNet2DConditionModel
37
+ from diffusers.models.attention_processor import (
38
+ AttnProcessor2_0,
39
+ FusedAttnProcessor2_0,
40
+ LoRAAttnProcessor2_0,
41
+ LoRAXFormersAttnProcessor,
42
+ XFormersAttnProcessor,
43
+ )
44
+ from diffusers.models.lora import adjust_lora_scale_text_encoder
45
+ from diffusers.schedulers import KarrasDiffusionSchedulers
46
+ from diffusers.utils import (
47
+ USE_PEFT_BACKEND,
48
+ deprecate,
49
+ is_invisible_watermark_available,
50
+ is_torch_xla_available,
51
+ logging,
52
+ replace_example_docstring,
53
+ scale_lora_layers,
54
+ unscale_lora_layers,
55
+ )
56
+ from diffusers.utils.torch_utils import randn_tensor
57
+ from diffusers.pipelines.pipeline_utils import DiffusionPipeline, StableDiffusionMixin
58
+ from diffusers.pipelines.stable_diffusion_xl.pipeline_output import StableDiffusionXLPipelineOutput
59
+
60
+
61
+ from unet.unet_controller import UNetController
62
+ import unet.utils as utils
63
+
64
+ if is_invisible_watermark_available():
65
+ from diffusers.pipelines.stable_diffusion_xl.watermark import StableDiffusionXLWatermarker
66
+
67
+ # if is_torch_xla_available():
68
+ # import torch_xla.core.xla_model as xm
69
+
70
+ # XLA_AVAILABLE = True
71
+ # else:
72
+ # XLA_AVAILABLE = False
73
+
74
+
75
+ logger = logging.get_logger(__name__) # pylint: disable=invalid-name
76
+
77
+ EXAMPLE_DOC_STRING = """
78
+ Examples:
79
+ ```py
80
+ >>> import torch
81
+ >>> from diffusers import StableDiffusionXLPipeline
82
+
83
+ >>> pipe = StableDiffusionXLPipeline.from_pretrained(
84
+ ... "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16
85
+ ... )
86
+ >>> pipe = pipe.to("cuda")
87
+
88
+ >>> prompt = "a photo of an astronaut riding a horse on mars"
89
+ >>> image = pipe(prompt).images[0]
90
+ ```
91
+ """
92
+
93
+
94
+ # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.rescale_noise_cfg
95
+ def rescale_noise_cfg(noise_cfg, noise_pred_text, guidance_rescale=0.0):
96
+ """
97
+ Rescale `noise_cfg` according to `guidance_rescale`. Based on findings of [Common Diffusion Noise Schedules and
98
+ Sample Steps are Flawed](https://arxiv.org/pdf/2305.08891.pdf). See Section 3.4
99
+ """
100
+ std_text = noise_pred_text.std(dim=list(range(1, noise_pred_text.ndim)), keepdim=True)
101
+ std_cfg = noise_cfg.std(dim=list(range(1, noise_cfg.ndim)), keepdim=True)
102
+ # rescale the results from guidance (fixes overexposure)
103
+ noise_pred_rescaled = noise_cfg * (std_text / std_cfg)
104
+ # mix with the original results from guidance by factor guidance_rescale to avoid "plain looking" images
105
+ noise_cfg = guidance_rescale * noise_pred_rescaled + (1 - guidance_rescale) * noise_cfg
106
+ return noise_cfg
107
+
108
+
109
+ # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.retrieve_timesteps
110
+ def retrieve_timesteps(
111
+ scheduler,
112
+ num_inference_steps: Optional[int] = None,
113
+ device: Optional[Union[str, torch.device]] = None,
114
+ timesteps: Optional[List[int]] = None,
115
+ **kwargs,
116
+ ):
117
+ """
118
+ Calls the scheduler's `set_timesteps` method and retrieves timesteps from the scheduler after the call. Handles
119
+ custom timesteps. Any kwargs will be supplied to `scheduler.set_timesteps`.
120
+
121
+ Args:
122
+ scheduler (`SchedulerMixin`):
123
+ The scheduler to get timesteps from.
124
+ num_inference_steps (`int`):
125
+ The number of diffusion steps used when generating samples with a pre-trained model. If used,
126
+ `timesteps` must be `None`.
127
+ device (`str` or `torch.device`, *optional*):
128
+ The device to which the timesteps should be moved to. If `None`, the timesteps are not moved.
129
+ timesteps (`List[int]`, *optional*):
130
+ Custom timesteps used to support arbitrary spacing between timesteps. If `None`, then the default
131
+ timestep spacing strategy of the scheduler is used. If `timesteps` is passed, `num_inference_steps`
132
+ must be `None`.
133
+
134
+ Returns:
135
+ `Tuple[torch.Tensor, int]`: A tuple where the first element is the timestep schedule from the scheduler and the
136
+ second element is the number of inference steps.
137
+ """
138
+ if timesteps is not None:
139
+ accepts_timesteps = "timesteps" in set(inspect.signature(scheduler.set_timesteps).parameters.keys())
140
+ if not accepts_timesteps:
141
+ raise ValueError(
142
+ f"The current scheduler class {scheduler.__class__}'s `set_timesteps` does not support custom"
143
+ f" timestep schedules. Please check whether you are using the correct scheduler."
144
+ )
145
+ scheduler.set_timesteps(timesteps=timesteps, device=device, **kwargs)
146
+ timesteps = scheduler.timesteps
147
+ num_inference_steps = len(timesteps)
148
+ else:
149
+ scheduler.set_timesteps(num_inference_steps, device=device, **kwargs)
150
+ timesteps = scheduler.timesteps
151
+ return timesteps, num_inference_steps
152
+
153
+
154
+ class StableDiffusionXLPipeline(
155
+ DiffusionPipeline,
156
+ StableDiffusionMixin,
157
+ FromSingleFileMixin,
158
+ StableDiffusionXLLoraLoaderMixin,
159
+ TextualInversionLoaderMixin,
160
+ IPAdapterMixin,
161
+ ):
162
+ r"""
163
+ Pipeline for text-to-image generation using Stable Diffusion XL.
164
+
165
+ This model inherits from [`DiffusionPipeline`]. Check the superclass documentation for the generic methods the
166
+ library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.)
167
+
168
+ The pipeline also inherits the following loading methods:
169
+ - [`~loaders.TextualInversionLoaderMixin.load_textual_inversion`] for loading textual inversion embeddings
170
+ - [`~loaders.FromSingleFileMixin.from_single_file`] for loading `.ckpt` files
171
+ - [`~loaders.StableDiffusionXLLoraLoaderMixin.load_lora_weights`] for loading LoRA weights
172
+ - [`~loaders.StableDiffusionXLLoraLoaderMixin.save_lora_weights`] for saving LoRA weights
173
+ - [`~loaders.IPAdapterMixin.load_ip_adapter`] for loading IP Adapters
174
+
175
+ Args:
176
+ vae ([`AutoencoderKL`]):
177
+ Variational Auto-Encoder (VAE) Model to encode and decode images to and from latent representations.
178
+ text_encoder ([`CLIPTextModel`]):
179
+ Frozen text-encoder. Stable Diffusion XL uses the text portion of
180
+ [CLIP](https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPTextModel), specifically
181
+ the [clip-vit-large-patch14](https://huggingface.co/openai/clip-vit-large-patch14) variant.
182
+ text_encoder_2 ([` CLIPTextModelWithProjection`]):
183
+ Second frozen text-encoder. Stable Diffusion XL uses the text and pool portion of
184
+ [CLIP](https://huggingface.co/docs/transformers/model_doc/clip#transformers.CLIPTextModelWithProjection),
185
+ specifically the
186
+ [laion/CLIP-ViT-bigG-14-laion2B-39B-b160k](https://huggingface.co/laion/CLIP-ViT-bigG-14-laion2B-39B-b160k)
187
+ variant.
188
+ tokenizer (`CLIPTokenizer`):
189
+ Tokenizer of class
190
+ [CLIPTokenizer](https://huggingface.co/docs/transformers/v4.21.0/en/model_doc/clip#transformers.CLIPTokenizer).
191
+ tokenizer_2 (`CLIPTokenizer`):
192
+ Second Tokenizer of class
193
+ [CLIPTokenizer](https://huggingface.co/docs/transformers/v4.21.0/en/model_doc/clip#transformers.CLIPTokenizer).
194
+ unet ([`UNet2DConditionModel`]): Conditional U-Net architecture to denoise the encoded image latents.
195
+ scheduler ([`SchedulerMixin`]):
196
+ A scheduler to be used in combination with `unet` to denoise the encoded image latents. Can be one of
197
+ [`DDIMScheduler`], [`LMSDiscreteScheduler`], or [`PNDMScheduler`].
198
+ force_zeros_for_empty_prompt (`bool`, *optional*, defaults to `"True"`):
199
+ Whether the negative prompt embeddings shall be forced to always be set to 0. Also see the config of
200
+ `stabilityai/stable-diffusion-xl-base-1-0`.
201
+ add_watermarker (`bool`, *optional*):
202
+ Whether to use the [invisible_watermark library](https://github.com/ShieldMnt/invisible-watermark/) to
203
+ watermark output images. If not defined, it will default to True if the package is installed, otherwise no
204
+ watermarker will be used.
205
+ """
206
+
207
+ model_cpu_offload_seq = "text_encoder->text_encoder_2->image_encoder->unet->vae"
208
+ _optional_components = [
209
+ "tokenizer",
210
+ "tokenizer_2",
211
+ "text_encoder",
212
+ "text_encoder_2",
213
+ "image_encoder",
214
+ "feature_extractor",
215
+ ]
216
+ _callback_tensor_inputs = [
217
+ "latents",
218
+ "prompt_embeds",
219
+ "negative_prompt_embeds",
220
+ "add_text_embeds",
221
+ "add_time_ids",
222
+ "negative_pooled_prompt_embeds",
223
+ "negative_add_time_ids",
224
+ ]
225
+
226
+ def __init__(
227
+ self,
228
+ vae: AutoencoderKL,
229
+ text_encoder: CLIPTextModel,
230
+ text_encoder_2: CLIPTextModelWithProjection,
231
+ tokenizer: CLIPTokenizer,
232
+ tokenizer_2: CLIPTokenizer,
233
+ unet: UNet2DConditionModel,
234
+ scheduler: KarrasDiffusionSchedulers,
235
+ image_encoder: CLIPVisionModelWithProjection = None,
236
+ feature_extractor: CLIPImageProcessor = None,
237
+ force_zeros_for_empty_prompt: bool = True,
238
+ add_watermarker: Optional[bool] = None,
239
+ ):
240
+ super().__init__()
241
+
242
+ self.register_modules(
243
+ vae=vae,
244
+ text_encoder=text_encoder,
245
+ text_encoder_2=text_encoder_2,
246
+ tokenizer=tokenizer,
247
+ tokenizer_2=tokenizer_2,
248
+ unet=unet,
249
+ scheduler=scheduler,
250
+ image_encoder=image_encoder,
251
+ feature_extractor=feature_extractor,
252
+ )
253
+ self.register_to_config(force_zeros_for_empty_prompt=force_zeros_for_empty_prompt)
254
+ self.vae_scale_factor = 2 ** (len(self.vae.config.block_out_channels) - 1)
255
+ self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor)
256
+
257
+ self.default_sample_size = self.unet.config.sample_size
258
+
259
+ add_watermarker = add_watermarker if add_watermarker is not None else is_invisible_watermark_available()
260
+
261
+ if add_watermarker:
262
+ self.watermark = StableDiffusionXLWatermarker()
263
+ else:
264
+ self.watermark = None
265
+
266
+
267
+ def encode_prompt(
268
+ self,
269
+ prompt: str,
270
+ prompt_2: Optional[str] = None,
271
+ device: Optional[torch.device] = None,
272
+ num_images_per_prompt: int = 1,
273
+ do_classifier_free_guidance: bool = True,
274
+ negative_prompt: Optional[str] = None,
275
+ negative_prompt_2: Optional[str] = None,
276
+ prompt_embeds: Optional[torch.FloatTensor] = None,
277
+ negative_prompt_embeds: Optional[torch.FloatTensor] = None,
278
+ pooled_prompt_embeds: Optional[torch.FloatTensor] = None,
279
+ negative_pooled_prompt_embeds: Optional[torch.FloatTensor] = None,
280
+ lora_scale: Optional[float] = None,
281
+ clip_skip: Optional[int] = None,
282
+ unet_controller: Optional[UNetController] = None,
283
+ ):
284
+ r"""
285
+ Encodes the prompt into text encoder hidden states.
286
+
287
+ Args:
288
+ prompt (`str` or `List[str]`, *optional*):
289
+ prompt to be encoded
290
+ prompt_2 (`str` or `List[str]`, *optional*):
291
+ The prompt or prompts to be sent to the `tokenizer_2` and `text_encoder_2`. If not defined, `prompt` is
292
+ used in both text-encoders
293
+ device: (`torch.device`):
294
+ torch device
295
+ num_images_per_prompt (`int`):
296
+ number of images that should be generated per prompt
297
+ do_classifier_free_guidance (`bool`):
298
+ whether to use classifier free guidance or not
299
+ negative_prompt (`str` or `List[str]`, *optional*):
300
+ The prompt or prompts not to guide the image generation. If not defined, one has to pass
301
+ `negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is
302
+ less than `1`).
303
+ negative_prompt_2 (`str` or `List[str]`, *optional*):
304
+ The prompt or prompts not to guide the image generation to be sent to `tokenizer_2` and
305
+ `text_encoder_2`. If not defined, `negative_prompt` is used in both text-encoders
306
+ prompt_embeds (`torch.FloatTensor`, *optional*):
307
+ Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
308
+ provided, text embeddings will be generated from `prompt` input argument.
309
+ negative_prompt_embeds (`torch.FloatTensor`, *optional*):
310
+ Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
311
+ weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input
312
+ argument.
313
+ pooled_prompt_embeds (`torch.FloatTensor`, *optional*):
314
+ Pre-generated pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting.
315
+ If not provided, pooled text embeddings will be generated from `prompt` input argument.
316
+ negative_pooled_prompt_embeds (`torch.FloatTensor`, *optional*):
317
+ Pre-generated negative pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
318
+ weighting. If not provided, pooled negative_prompt_embeds will be generated from `negative_prompt`
319
+ input argument.
320
+ lora_scale (`float`, *optional*):
321
+ A lora scale that will be applied to all LoRA layers of the text encoder if LoRA layers are loaded.
322
+ clip_skip (`int`, *optional*):
323
+ Number of layers to be skipped from CLIP while computing the prompt embeddings. A value of 1 means that
324
+ the output of the pre-final layer will be used for computing the prompt embeddings.
325
+ """
326
+ device = device or self._execution_device
327
+
328
+ # set lora scale so that monkey patched LoRA
329
+ # function of text encoder can correctly access it
330
+ if lora_scale is not None and isinstance(self, StableDiffusionXLLoraLoaderMixin):
331
+ self._lora_scale = lora_scale
332
+
333
+ # dynamically adjust the LoRA scale
334
+ if self.text_encoder is not None:
335
+ if not USE_PEFT_BACKEND:
336
+ adjust_lora_scale_text_encoder(self.text_encoder, lora_scale)
337
+ else:
338
+ scale_lora_layers(self.text_encoder, lora_scale)
339
+
340
+ if self.text_encoder_2 is not None:
341
+ if not USE_PEFT_BACKEND:
342
+ adjust_lora_scale_text_encoder(self.text_encoder_2, lora_scale)
343
+ else:
344
+ scale_lora_layers(self.text_encoder_2, lora_scale)
345
+
346
+ prompt = [prompt] if isinstance(prompt, str) else prompt
347
+
348
+ if prompt is not None:
349
+ batch_size = len(prompt)
350
+ else:
351
+ batch_size = prompt_embeds.shape[0]
352
+
353
+ # Define tokenizers and text encoders
354
+ tokenizers = [self.tokenizer, self.tokenizer_2] if self.tokenizer is not None else [self.tokenizer_2]
355
+ text_encoders = (
356
+ [self.text_encoder, self.text_encoder_2] if self.text_encoder is not None else [self.text_encoder_2]
357
+ )
358
+
359
+ if prompt_embeds is None:
360
+ prompt_2 = prompt_2 or prompt
361
+ prompt_2 = [prompt_2] if isinstance(prompt_2, str) else prompt_2
362
+
363
+ # textual inversion: process multi-vector tokens if necessary
364
+ prompt_embeds_list = []
365
+ prompts = [prompt, prompt_2]
366
+ for prompt, tokenizer, text_encoder in zip(prompts, tokenizers, text_encoders):
367
+ if isinstance(self, TextualInversionLoaderMixin):
368
+ prompt = self.maybe_convert_prompt(prompt, tokenizer)
369
+
370
+ text_inputs = tokenizer(
371
+ prompt,
372
+ padding="max_length",
373
+ max_length=tokenizer.model_max_length,
374
+ truncation=True,
375
+ return_tensors="pt",
376
+ )
377
+
378
+ text_input_ids = text_inputs.input_ids
379
+ untruncated_ids = tokenizer(prompt, padding="longest", return_tensors="pt").input_ids
380
+
381
+ if untruncated_ids.shape[-1] >= text_input_ids.shape[-1] and not torch.equal(
382
+ text_input_ids, untruncated_ids
383
+ ):
384
+ removed_text = tokenizer.batch_decode(untruncated_ids[:, tokenizer.model_max_length - 1 : -1])
385
+ logger.warning(
386
+ "The following part of your input was truncated because CLIP can only handle sequences up to"
387
+ f" {tokenizer.model_max_length} tokens: {removed_text}"
388
+ )
389
+
390
+ prompt_embeds = text_encoder(text_input_ids.to(device), output_hidden_states=True)
391
+
392
+ # We are only ALWAYS interested in the pooled output of the final text encoder
393
+ pooled_prompt_embeds = prompt_embeds[0] # [a, 1280]
394
+
395
+ if unet_controller is not None and unet_controller.frame_prompt_express is not None:
396
+ if unet_controller.Remove_pool_embeds:
397
+ pooled_prompt_embeds = pooled_prompt_embeds.zero_()
398
+
399
+ input_prompt_embeds = prompt_embeds.hidden_states[-2] if clip_skip is None else prompt_embeds.hidden_states[-(clip_skip + 2)]
400
+
401
+ alpha_weaken = unet_controller.Alpha_weaken
402
+ beta_weaken = unet_controller.Beta_weaken
403
+ alpha_strengthen = unet_controller.Alpha_enhance
404
+ beta_strengthen = unet_controller.Beta_enhance
405
+ frame_prompt_suppress = unet_controller.frame_prompt_suppress
406
+ frame_prompt_express = unet_controller.frame_prompt_express
407
+
408
+ if unet_controller.Prompt_embeds_mode == 'svr':
409
+ for movement in frame_prompt_suppress:
410
+ utils.swr_single_prompt_embeds(movement, input_prompt_embeds[0], prompt[0], unet_controller.tokenizer, alpha=alpha_weaken, beta=beta_weaken)
411
+ utils.swr_single_prompt_embeds(frame_prompt_express, input_prompt_embeds[0], prompt[0], unet_controller.tokenizer, alpha=alpha_strengthen, beta=beta_strengthen)
412
+
413
+ elif unet_controller.Prompt_embeds_mode == 'svr-eot':
414
+ for movement in frame_prompt_suppress:
415
+ utils.swr_single_prompt_embeds(movement, input_prompt_embeds[0], prompt[0], unet_controller.tokenizer, alpha=alpha_weaken, beta=beta_weaken, zero_eot=True)
416
+ utils.swr_single_prompt_embeds(frame_prompt_express, input_prompt_embeds[0], prompt[0], unet_controller.tokenizer, alpha=alpha_strengthen, beta=beta_strengthen, zero_eot=True)
417
+
418
+ elif unet_controller.Prompt_embeds_mode == "original":
419
+ pass
420
+
421
+ else:
422
+ raise ValueError(f"Invalid prompt embeds mode: {unet_controller.Prompt_embeds_mode}")
423
+
424
+ prompt_embeds = input_prompt_embeds
425
+
426
+ elif unet_controller is not None and unet_controller.frame_prompt_express_list is not None:
427
+
428
+ if unet_controller.Remove_pool_embeds:
429
+ pooled_prompt_embeds = pooled_prompt_embeds.zero_()
430
+
431
+ input_prompt_embeds = prompt_embeds.hidden_states[-2] if clip_skip is None else prompt_embeds.hidden_states[-(clip_skip + 2)]
432
+
433
+ alpha_weaken = unet_controller.Alpha_weaken
434
+ beta_weaken = unet_controller.Beta_weaken
435
+ alpha_strengthen = unet_controller.Alpha_enhance
436
+ beta_strengthen = unet_controller.Beta_enhance
437
+ frame_prompt_suppress_list = unet_controller.frame_prompt_suppress_list
438
+ frame_prompt_express_list = unet_controller.frame_prompt_express_list
439
+
440
+ for index, (frame_prompt_suppress, frame_prompt_express) in enumerate(zip(frame_prompt_suppress_list, frame_prompt_express_list)):
441
+
442
+ if unet_controller.Prompt_embeds_mode == 'svr':
443
+ for movement in frame_prompt_suppress:
444
+ utils.swr_single_prompt_embeds(movement, input_prompt_embeds[index], prompt[index], unet_controller.tokenizer, alpha=alpha_weaken, beta=beta_weaken)
445
+ utils.swr_single_prompt_embeds(frame_prompt_express, input_prompt_embeds[index], prompt[index], unet_controller.tokenizer, alpha=alpha_strengthen, beta=beta_strengthen)
446
+
447
+ elif unet_controller.Prompt_embeds_mode == "original":
448
+ pass
449
+
450
+ else:
451
+ raise ValueError(f"Invalid prompt embeds mode: {unet_controller.Prompt_embeds_mode}")
452
+
453
+ prompt_embeds = input_prompt_embeds
454
+
455
+ else: # original
456
+ if clip_skip is None:
457
+ prompt_embeds = prompt_embeds.hidden_states[-2]
458
+ else:
459
+ # "2" because SDXL always indexes from the penultimate layer.
460
+ prompt_embeds = prompt_embeds.hidden_states[-(clip_skip + 2)]
461
+
462
+ prompt_embeds_list.append(prompt_embeds)
463
+
464
+ prompt_embeds = torch.concat(prompt_embeds_list, dim=-1) # [a, 77, 2048]
465
+
466
+ # get unconditional embeddings for classifier free guidance
467
+ zero_out_negative_prompt = negative_prompt is None and self.config.force_zeros_for_empty_prompt
468
+ if do_classifier_free_guidance and negative_prompt_embeds is None and zero_out_negative_prompt:
469
+ negative_prompt_embeds = torch.zeros_like(prompt_embeds)
470
+ negative_pooled_prompt_embeds = torch.zeros_like(pooled_prompt_embeds)
471
+ elif do_classifier_free_guidance and negative_prompt_embeds is None:
472
+ negative_prompt = negative_prompt or ""
473
+ negative_prompt_2 = negative_prompt_2 or negative_prompt
474
+
475
+ # normalize str to list
476
+ negative_prompt = batch_size * [negative_prompt] if isinstance(negative_prompt, str) else negative_prompt
477
+ negative_prompt_2 = (
478
+ batch_size * [negative_prompt_2] if isinstance(negative_prompt_2, str) else negative_prompt_2
479
+ )
480
+
481
+ uncond_tokens: List[str]
482
+ if prompt is not None and type(prompt) is not type(negative_prompt):
483
+ raise TypeError(
484
+ f"`negative_prompt` should be the same type to `prompt`, but got {type(negative_prompt)} !="
485
+ f" {type(prompt)}."
486
+ )
487
+ elif batch_size != len(negative_prompt):
488
+ raise ValueError(
489
+ f"`negative_prompt`: {negative_prompt} has batch size {len(negative_prompt)}, but `prompt`:"
490
+ f" {prompt} has batch size {batch_size}. Please make sure that passed `negative_prompt` matches"
491
+ " the batch size of `prompt`."
492
+ )
493
+ else:
494
+ uncond_tokens = [negative_prompt, negative_prompt_2]
495
+
496
+ negative_prompt_embeds_list = []
497
+ for negative_prompt, tokenizer, text_encoder in zip(uncond_tokens, tokenizers, text_encoders):
498
+ if isinstance(self, TextualInversionLoaderMixin):
499
+ negative_prompt = self.maybe_convert_prompt(negative_prompt, tokenizer)
500
+
501
+ max_length = prompt_embeds.shape[1]
502
+ uncond_input = tokenizer(
503
+ negative_prompt,
504
+ padding="max_length",
505
+ max_length=max_length,
506
+ truncation=True,
507
+ return_tensors="pt",
508
+ )
509
+
510
+ negative_prompt_embeds = text_encoder(
511
+ uncond_input.input_ids.to(device),
512
+ output_hidden_states=True,
513
+ )
514
+ # We are only ALWAYS interested in the pooled output of the final text encoder
515
+ negative_pooled_prompt_embeds = negative_prompt_embeds[0]
516
+ negative_prompt_embeds = negative_prompt_embeds.hidden_states[-2]
517
+
518
+ negative_prompt_embeds_list.append(negative_prompt_embeds)
519
+
520
+ negative_prompt_embeds = torch.concat(negative_prompt_embeds_list, dim=-1)
521
+
522
+ if self.text_encoder_2 is not None:
523
+ prompt_embeds = prompt_embeds.to(dtype=self.text_encoder_2.dtype, device=device)
524
+ else:
525
+ prompt_embeds = prompt_embeds.to(dtype=self.unet.dtype, device=device)
526
+
527
+ bs_embed, seq_len, _ = prompt_embeds.shape
528
+ # duplicate text embeddings for each generation per prompt, using mps friendly method
529
+ prompt_embeds = prompt_embeds.repeat(1, num_images_per_prompt, 1)
530
+ prompt_embeds = prompt_embeds.view(bs_embed * num_images_per_prompt, seq_len, -1)
531
+
532
+ if do_classifier_free_guidance:
533
+ # duplicate unconditional embeddings for each generation per prompt, using mps friendly method
534
+ seq_len = negative_prompt_embeds.shape[1]
535
+
536
+ if self.text_encoder_2 is not None:
537
+ negative_prompt_embeds = negative_prompt_embeds.to(dtype=self.text_encoder_2.dtype, device=device)
538
+ else:
539
+ negative_prompt_embeds = negative_prompt_embeds.to(dtype=self.unet.dtype, device=device)
540
+
541
+ negative_prompt_embeds = negative_prompt_embeds.repeat(1, num_images_per_prompt, 1)
542
+ negative_prompt_embeds = negative_prompt_embeds.view(batch_size * num_images_per_prompt, seq_len, -1)
543
+
544
+ pooled_prompt_embeds = pooled_prompt_embeds.repeat(1, num_images_per_prompt).view(
545
+ bs_embed * num_images_per_prompt, -1
546
+ )
547
+ if do_classifier_free_guidance:
548
+ negative_pooled_prompt_embeds = negative_pooled_prompt_embeds.repeat(1, num_images_per_prompt).view(
549
+ bs_embed * num_images_per_prompt, -1
550
+ )
551
+
552
+ if self.text_encoder is not None:
553
+ if isinstance(self, StableDiffusionXLLoraLoaderMixin) and USE_PEFT_BACKEND:
554
+ # Retrieve the original scale by scaling back the LoRA layers
555
+ unscale_lora_layers(self.text_encoder, lora_scale)
556
+
557
+ if self.text_encoder_2 is not None:
558
+ if isinstance(self, StableDiffusionXLLoraLoaderMixin) and USE_PEFT_BACKEND:
559
+ # Retrieve the original scale by scaling back the LoRA layers
560
+ unscale_lora_layers(self.text_encoder_2, lora_scale)
561
+
562
+ return prompt_embeds, negative_prompt_embeds, pooled_prompt_embeds, negative_pooled_prompt_embeds
563
+
564
+ # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.encode_image
565
+ def encode_image(self, image, device, num_images_per_prompt, output_hidden_states=None):
566
+ dtype = next(self.image_encoder.parameters()).dtype
567
+
568
+ if not isinstance(image, torch.Tensor):
569
+ image = self.feature_extractor(image, return_tensors="pt").pixel_values
570
+
571
+ image = image.to(device=device, dtype=dtype)
572
+ if output_hidden_states:
573
+ image_enc_hidden_states = self.image_encoder(image, output_hidden_states=True).hidden_states[-2]
574
+ image_enc_hidden_states = image_enc_hidden_states.repeat_interleave(num_images_per_prompt, dim=0)
575
+ uncond_image_enc_hidden_states = self.image_encoder(
576
+ torch.zeros_like(image), output_hidden_states=True
577
+ ).hidden_states[-2]
578
+ uncond_image_enc_hidden_states = uncond_image_enc_hidden_states.repeat_interleave(
579
+ num_images_per_prompt, dim=0
580
+ )
581
+ return image_enc_hidden_states, uncond_image_enc_hidden_states
582
+ else:
583
+ image_embeds = self.image_encoder(image).image_embeds
584
+ image_embeds = image_embeds.repeat_interleave(num_images_per_prompt, dim=0)
585
+ uncond_image_embeds = torch.zeros_like(image_embeds)
586
+
587
+ return image_embeds, uncond_image_embeds
588
+
589
+ # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_ip_adapter_image_embeds
590
+ def prepare_ip_adapter_image_embeds(
591
+ self, ip_adapter_image, ip_adapter_image_embeds, device, num_images_per_prompt, do_classifier_free_guidance
592
+ ):
593
+ if ip_adapter_image_embeds is None:
594
+ if not isinstance(ip_adapter_image, list):
595
+ ip_adapter_image = [ip_adapter_image]
596
+
597
+ if len(ip_adapter_image) != len(self.unet.encoder_hid_proj.image_projection_layers):
598
+ raise ValueError(
599
+ f"`ip_adapter_image` must have same length as the number of IP Adapters. Got {len(ip_adapter_image)} images and {len(self.unet.encoder_hid_proj.image_projection_layers)} IP Adapters."
600
+ )
601
+
602
+ image_embeds = []
603
+ for single_ip_adapter_image, image_proj_layer in zip(
604
+ ip_adapter_image, self.unet.encoder_hid_proj.image_projection_layers
605
+ ):
606
+ output_hidden_state = not isinstance(image_proj_layer, ImageProjection)
607
+ single_image_embeds, single_negative_image_embeds = self.encode_image(
608
+ single_ip_adapter_image, device, 1, output_hidden_state
609
+ )
610
+ single_image_embeds = torch.stack([single_image_embeds] * num_images_per_prompt, dim=0)
611
+ single_negative_image_embeds = torch.stack(
612
+ [single_negative_image_embeds] * num_images_per_prompt, dim=0
613
+ )
614
+
615
+ if do_classifier_free_guidance:
616
+ single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
617
+ single_image_embeds = single_image_embeds.to(device)
618
+
619
+ image_embeds.append(single_image_embeds)
620
+ else:
621
+ repeat_dims = [1]
622
+ image_embeds = []
623
+ for single_image_embeds in ip_adapter_image_embeds:
624
+ if do_classifier_free_guidance:
625
+ single_negative_image_embeds, single_image_embeds = single_image_embeds.chunk(2)
626
+ single_image_embeds = single_image_embeds.repeat(
627
+ num_images_per_prompt, *(repeat_dims * len(single_image_embeds.shape[1:]))
628
+ )
629
+ single_negative_image_embeds = single_negative_image_embeds.repeat(
630
+ num_images_per_prompt, *(repeat_dims * len(single_negative_image_embeds.shape[1:]))
631
+ )
632
+ single_image_embeds = torch.cat([single_negative_image_embeds, single_image_embeds])
633
+ else:
634
+ single_image_embeds = single_image_embeds.repeat(
635
+ num_images_per_prompt, *(repeat_dims * len(single_image_embeds.shape[1:]))
636
+ )
637
+ image_embeds.append(single_image_embeds)
638
+
639
+ return image_embeds
640
+
641
+ # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_extra_step_kwargs
642
+ def prepare_extra_step_kwargs(self, generator, eta):
643
+ # prepare extra kwargs for the scheduler step, since not all schedulers have the same signature
644
+ # eta (η) is only used with the DDIMScheduler, it will be ignored for other schedulers.
645
+ # eta corresponds to η in DDIM paper: https://arxiv.org/abs/2010.02502
646
+ # and should be between [0, 1]
647
+
648
+ accepts_eta = "eta" in set(inspect.signature(self.scheduler.step).parameters.keys())
649
+ extra_step_kwargs = {}
650
+ if accepts_eta:
651
+ extra_step_kwargs["eta"] = eta
652
+
653
+ # check if the scheduler accepts generator
654
+ accepts_generator = "generator" in set(inspect.signature(self.scheduler.step).parameters.keys())
655
+ if accepts_generator:
656
+ extra_step_kwargs["generator"] = generator
657
+ return extra_step_kwargs
658
+
659
+ def check_inputs(
660
+ self,
661
+ prompt,
662
+ prompt_2,
663
+ height,
664
+ width,
665
+ callback_steps,
666
+ negative_prompt=None,
667
+ negative_prompt_2=None,
668
+ prompt_embeds=None,
669
+ negative_prompt_embeds=None,
670
+ pooled_prompt_embeds=None,
671
+ negative_pooled_prompt_embeds=None,
672
+ ip_adapter_image=None,
673
+ ip_adapter_image_embeds=None,
674
+ callback_on_step_end_tensor_inputs=None,
675
+ ):
676
+ if height % 8 != 0 or width % 8 != 0:
677
+ raise ValueError(f"`height` and `width` have to be divisible by 8 but are {height} and {width}.")
678
+
679
+ if callback_steps is not None and (not isinstance(callback_steps, int) or callback_steps <= 0):
680
+ raise ValueError(
681
+ f"`callback_steps` has to be a positive integer but is {callback_steps} of type"
682
+ f" {type(callback_steps)}."
683
+ )
684
+
685
+ if callback_on_step_end_tensor_inputs is not None and not all(
686
+ k in self._callback_tensor_inputs for k in callback_on_step_end_tensor_inputs
687
+ ):
688
+ raise ValueError(
689
+ f"`callback_on_step_end_tensor_inputs` has to be in {self._callback_tensor_inputs}, but found {[k for k in callback_on_step_end_tensor_inputs if k not in self._callback_tensor_inputs]}"
690
+ )
691
+
692
+ if prompt is not None and prompt_embeds is not None:
693
+ raise ValueError(
694
+ f"Cannot forward both `prompt`: {prompt} and `prompt_embeds`: {prompt_embeds}. Please make sure to"
695
+ " only forward one of the two."
696
+ )
697
+ elif prompt_2 is not None and prompt_embeds is not None:
698
+ raise ValueError(
699
+ f"Cannot forward both `prompt_2`: {prompt_2} and `prompt_embeds`: {prompt_embeds}. Please make sure to"
700
+ " only forward one of the two."
701
+ )
702
+ elif prompt is None and prompt_embeds is None:
703
+ raise ValueError(
704
+ "Provide either `prompt` or `prompt_embeds`. Cannot leave both `prompt` and `prompt_embeds` undefined."
705
+ )
706
+ elif prompt is not None and (not isinstance(prompt, str) and not isinstance(prompt, list)):
707
+ raise ValueError(f"`prompt` has to be of type `str` or `list` but is {type(prompt)}")
708
+ elif prompt_2 is not None and (not isinstance(prompt_2, str) and not isinstance(prompt_2, list)):
709
+ raise ValueError(f"`prompt_2` has to be of type `str` or `list` but is {type(prompt_2)}")
710
+
711
+ if negative_prompt is not None and negative_prompt_embeds is not None:
712
+ raise ValueError(
713
+ f"Cannot forward both `negative_prompt`: {negative_prompt} and `negative_prompt_embeds`:"
714
+ f" {negative_prompt_embeds}. Please make sure to only forward one of the two."
715
+ )
716
+ elif negative_prompt_2 is not None and negative_prompt_embeds is not None:
717
+ raise ValueError(
718
+ f"Cannot forward both `negative_prompt_2`: {negative_prompt_2} and `negative_prompt_embeds`:"
719
+ f" {negative_prompt_embeds}. Please make sure to only forward one of the two."
720
+ )
721
+
722
+ if prompt_embeds is not None and negative_prompt_embeds is not None:
723
+ if prompt_embeds.shape != negative_prompt_embeds.shape:
724
+ raise ValueError(
725
+ "`prompt_embeds` and `negative_prompt_embeds` must have the same shape when passed directly, but"
726
+ f" got: `prompt_embeds` {prompt_embeds.shape} != `negative_prompt_embeds`"
727
+ f" {negative_prompt_embeds.shape}."
728
+ )
729
+
730
+ if prompt_embeds is not None and pooled_prompt_embeds is None:
731
+ raise ValueError(
732
+ "If `prompt_embeds` are provided, `pooled_prompt_embeds` also have to be passed. Make sure to generate `pooled_prompt_embeds` from the same text encoder that was used to generate `prompt_embeds`."
733
+ )
734
+
735
+ if negative_prompt_embeds is not None and negative_pooled_prompt_embeds is None:
736
+ raise ValueError(
737
+ "If `negative_prompt_embeds` are provided, `negative_pooled_prompt_embeds` also have to be passed. Make sure to generate `negative_pooled_prompt_embeds` from the same text encoder that was used to generate `negative_prompt_embeds`."
738
+ )
739
+
740
+ if ip_adapter_image is not None and ip_adapter_image_embeds is not None:
741
+ raise ValueError(
742
+ "Provide either `ip_adapter_image` or `ip_adapter_image_embeds`. Cannot leave both `ip_adapter_image` and `ip_adapter_image_embeds` defined."
743
+ )
744
+
745
+ if ip_adapter_image_embeds is not None:
746
+ if not isinstance(ip_adapter_image_embeds, list):
747
+ raise ValueError(
748
+ f"`ip_adapter_image_embeds` has to be of type `list` but is {type(ip_adapter_image_embeds)}"
749
+ )
750
+ elif ip_adapter_image_embeds[0].ndim not in [3, 4]:
751
+ raise ValueError(
752
+ f"`ip_adapter_image_embeds` has to be a list of 3D or 4D tensors but is {ip_adapter_image_embeds[0].ndim}D"
753
+ )
754
+
755
+ # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline.prepare_latents
756
+ def prepare_latents(self, batch_size, num_channels_latents, height, width, dtype, device, generator, latents=None, same=False):
757
+ shape = (batch_size, num_channels_latents, height // self.vae_scale_factor, width // self.vae_scale_factor)
758
+ if isinstance(generator, list) and len(generator) != batch_size:
759
+ raise ValueError(
760
+ f"You have passed a list of generators of length {len(generator)}, but requested an effective batch"
761
+ f" size of {batch_size}. Make sure the batch size matches the length of the generators."
762
+ )
763
+
764
+ if latents is None:
765
+ latents = randn_tensor(shape, generator=generator, device=device, dtype=dtype)
766
+ if same: # set all prompts' init_latent to the same
767
+ latents_pivot = latents[0]
768
+ for index in range(shape[0] - 1):
769
+ latents[index+1] = latents_pivot
770
+ else:
771
+ latents = latents.to(device)
772
+
773
+ # print(torch.mean(latents[0]))
774
+ # scale the initial noise by the standard deviation required by the scheduler
775
+ latents = latents * self.scheduler.init_noise_sigma
776
+ return latents
777
+
778
+ def _get_add_time_ids(
779
+ self, original_size, crops_coords_top_left, target_size, dtype, text_encoder_projection_dim=None
780
+ ):
781
+ add_time_ids = list(original_size + crops_coords_top_left + target_size)
782
+
783
+ passed_add_embed_dim = (
784
+ self.unet.config.addition_time_embed_dim * len(add_time_ids) + text_encoder_projection_dim
785
+ )
786
+ expected_add_embed_dim = self.unet.add_embedding.linear_1.in_features
787
+
788
+ if expected_add_embed_dim != passed_add_embed_dim:
789
+ raise ValueError(
790
+ f"Model expects an added time embedding vector of length {expected_add_embed_dim}, but a vector of {passed_add_embed_dim} was created. The model has an incorrect config. Please check `unet.config.time_embedding_type` and `text_encoder_2.config.projection_dim`."
791
+ )
792
+
793
+ add_time_ids = torch.tensor([add_time_ids], dtype=dtype)
794
+ return add_time_ids
795
+
796
+ def upcast_vae(self):
797
+ dtype = self.vae.dtype
798
+ self.vae.to(dtype=torch.float32)
799
+ use_torch_2_0_or_xformers = isinstance(
800
+ self.vae.decoder.mid_block.attentions[0].processor,
801
+ (
802
+ AttnProcessor2_0,
803
+ XFormersAttnProcessor,
804
+ LoRAXFormersAttnProcessor,
805
+ LoRAAttnProcessor2_0,
806
+ FusedAttnProcessor2_0,
807
+ ),
808
+ )
809
+ # if xformers or torch_2_0 is used attention block does not need
810
+ # to be in float32 which can save lots of memory
811
+ if use_torch_2_0_or_xformers:
812
+ self.vae.post_quant_conv.to(dtype)
813
+ self.vae.decoder.conv_in.to(dtype)
814
+ self.vae.decoder.mid_block.to(dtype)
815
+
816
+ # Copied from diffusers.pipelines.latent_consistency_models.pipeline_latent_consistency_text2img.LatentConsistencyModelPipeline.get_guidance_scale_embedding
817
+ def get_guidance_scale_embedding(self, w, embedding_dim=512, dtype=torch.float32):
818
+ """
819
+ See https://github.com/google-research/vdm/blob/dc27b98a554f65cdc654b800da5aa1846545d41b/model_vdm.py#L298
820
+
821
+ Args:
822
+ timesteps (`torch.Tensor`):
823
+ generate embedding vectors at these timesteps
824
+ embedding_dim (`int`, *optional*, defaults to 512):
825
+ dimension of the embeddings to generate
826
+ dtype:
827
+ data type of the generated embeddings
828
+
829
+ Returns:
830
+ `torch.FloatTensor`: Embedding vectors with shape `(len(timesteps), embedding_dim)`
831
+ """
832
+ assert len(w.shape) == 1
833
+ w = w * 1000.0
834
+
835
+ half_dim = embedding_dim // 2
836
+ emb = torch.log(torch.tensor(10000.0)) / (half_dim - 1)
837
+ emb = torch.exp(torch.arange(half_dim, dtype=dtype) * -emb)
838
+ emb = w.to(dtype)[:, None] * emb[None, :]
839
+ emb = torch.cat([torch.sin(emb), torch.cos(emb)], dim=1)
840
+ if embedding_dim % 2 == 1: # zero pad
841
+ emb = torch.nn.functional.pad(emb, (0, 1))
842
+ assert emb.shape == (w.shape[0], embedding_dim)
843
+ return emb
844
+
845
+ @property
846
+ def guidance_scale(self):
847
+ return self._guidance_scale
848
+
849
+ @property
850
+ def guidance_rescale(self):
851
+ return self._guidance_rescale
852
+
853
+ @property
854
+ def clip_skip(self):
855
+ return self._clip_skip
856
+
857
+ # here `guidance_scale` is defined analog to the guidance weight `w` of equation (2)
858
+ # of the Imagen paper: https://arxiv.org/pdf/2205.11487.pdf . `guidance_scale = 1`
859
+ # corresponds to doing no classifier free guidance.
860
+ @property
861
+ def do_classifier_free_guidance(self):
862
+ return self._guidance_scale > 1 and self.unet.config.time_cond_proj_dim is None
863
+
864
+ @property
865
+ def cross_attention_kwargs(self):
866
+ return self._cross_attention_kwargs
867
+
868
+ @property
869
+ def denoising_end(self):
870
+ return self._denoising_end
871
+
872
+ @property
873
+ def num_timesteps(self):
874
+ return self._num_timesteps
875
+
876
+ @property
877
+ def interrupt(self):
878
+ return self._interrupt
879
+
880
+ @torch.no_grad()
881
+ @replace_example_docstring(EXAMPLE_DOC_STRING)
882
+ def __call__(
883
+ self,
884
+ prompt: Union[str, List[str]] = None,
885
+ prompt_2: Optional[Union[str, List[str]]] = None,
886
+ height: Optional[int] = None,
887
+ width: Optional[int] = None,
888
+ num_inference_steps: int = 50,
889
+ timesteps: List[int] = None,
890
+ denoising_end: Optional[float] = None,
891
+ guidance_scale: float = 5.0,
892
+ negative_prompt: Optional[Union[str, List[str]]] = None,
893
+ negative_prompt_2: Optional[Union[str, List[str]]] = None,
894
+ num_images_per_prompt: Optional[int] = 1,
895
+ eta: float = 0.0,
896
+ generator: Optional[Union[torch.Generator, List[torch.Generator]]] = None,
897
+ latents: Optional[torch.FloatTensor] = None,
898
+ prompt_embeds: Optional[torch.FloatTensor] = None,
899
+ negative_prompt_embeds: Optional[torch.FloatTensor] = None,
900
+ pooled_prompt_embeds: Optional[torch.FloatTensor] = None,
901
+ negative_pooled_prompt_embeds: Optional[torch.FloatTensor] = None,
902
+ ip_adapter_image: Optional[PipelineImageInput] = None,
903
+ ip_adapter_image_embeds: Optional[List[torch.FloatTensor]] = None,
904
+ output_type: Optional[str] = "pil",
905
+ return_dict: bool = True,
906
+ cross_attention_kwargs: Optional[Dict[str, Any]] = None,
907
+ guidance_rescale: float = 0.0,
908
+ original_size: Optional[Tuple[int, int]] = None,
909
+ crops_coords_top_left: Tuple[int, int] = (0, 0),
910
+ target_size: Optional[Tuple[int, int]] = None,
911
+ negative_original_size: Optional[Tuple[int, int]] = None,
912
+ negative_crops_coords_top_left: Tuple[int, int] = (0, 0),
913
+ negative_target_size: Optional[Tuple[int, int]] = None,
914
+ clip_skip: Optional[int] = None,
915
+ callback_on_step_end: Optional[Callable[[int, int, Dict], None]] = None,
916
+ callback_on_step_end_tensor_inputs: List[str] = ["latents"],
917
+ unet_controller: Optional[UNetController] = None,
918
+ **kwargs,
919
+ ):
920
+ r"""
921
+ Function invoked when calling the pipeline for generation.
922
+
923
+ Args:
924
+ prompt (`str` or `List[str]`, *optional*):
925
+ The prompt or prompts to guide the image generation. If not defined, one has to pass `prompt_embeds`.
926
+ instead.
927
+ prompt_2 (`str` or `List[str]`, *optional*):
928
+ The prompt or prompts to be sent to the `tokenizer_2` and `text_encoder_2`. If not defined, `prompt` is
929
+ used in both text-encoders
930
+ height (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
931
+ The height in pixels of the generated image. This is set to 1024 by default for the best results.
932
+ Anything below 512 pixels won't work well for
933
+ [stabilityai/stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
934
+ and checkpoints that are not specifically fine-tuned on low resolutions.
935
+ width (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor):
936
+ The width in pixels of the generated image. This is set to 1024 by default for the best results.
937
+ Anything below 512 pixels won't work well for
938
+ [stabilityai/stable-diffusion-xl-base-1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
939
+ and checkpoints that are not specifically fine-tuned on low resolutions.
940
+ num_inference_steps (`int`, *optional*, defaults to 50):
941
+ The number of denoising steps. More denoising steps usually lead to a higher quality image at the
942
+ expense of slower inference.
943
+ timesteps (`List[int]`, *optional*):
944
+ Custom timesteps to use for the denoising process with schedulers which support a `timesteps` argument
945
+ in their `set_timesteps` method. If not defined, the default behavior when `num_inference_steps` is
946
+ passed will be used. Must be in descending order.
947
+ denoising_end (`float`, *optional*):
948
+ When specified, determines the fraction (between 0.0 and 1.0) of the total denoising process to be
949
+ completed before it is intentionally prematurely terminated. As a result, the returned sample will
950
+ still retain a substantial amount of noise as determined by the discrete timesteps selected by the
951
+ scheduler. The denoising_end parameter should ideally be utilized when this pipeline forms a part of a
952
+ "Mixture of Denoisers" multi-pipeline setup, as elaborated in [**Refining the Image
953
+ Output**](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/stable_diffusion_xl#refining-the-image-output)
954
+ guidance_scale (`float`, *optional*, defaults to 5.0):
955
+ Guidance scale as defined in [Classifier-Free Diffusion Guidance](https://arxiv.org/abs/2207.12598).
956
+ `guidance_scale` is defined as `w` of equation 2. of [Imagen
957
+ Paper](https://arxiv.org/pdf/2205.11487.pdf). Guidance scale is enabled by setting `guidance_scale >
958
+ 1`. Higher guidance scale encourages to generate images that are closely linked to the text `prompt`,
959
+ usually at the expense of lower image quality.
960
+ negative_prompt (`str` or `List[str]`, *optional*):
961
+ The prompt or prompts not to guide the image generation. If not defined, one has to pass
962
+ `negative_prompt_embeds` instead. Ignored when not using guidance (i.e., ignored if `guidance_scale` is
963
+ less than `1`).
964
+ negative_prompt_2 (`str` or `List[str]`, *optional*):
965
+ The prompt or prompts not to guide the image generation to be sent to `tokenizer_2` and
966
+ `text_encoder_2`. If not defined, `negative_prompt` is used in both text-encoders
967
+ num_images_per_prompt (`int`, *optional*, defaults to 1):
968
+ The number of images to generate per prompt.
969
+ eta (`float`, *optional*, defaults to 0.0):
970
+ Corresponds to parameter eta (η) in the DDIM paper: https://arxiv.org/abs/2010.02502. Only applies to
971
+ [`schedulers.DDIMScheduler`], will be ignored for others.
972
+ generator (`torch.Generator` or `List[torch.Generator]`, *optional*):
973
+ One or a list of [torch generator(s)](https://pytorch.org/docs/stable/generated/torch.Generator.html)
974
+ to make generation deterministic.
975
+ latents (`torch.FloatTensor`, *optional*):
976
+ Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image
977
+ generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
978
+ tensor will ge generated by sampling using the supplied random `generator`.
979
+ prompt_embeds (`torch.FloatTensor`, *optional*):
980
+ Pre-generated text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting. If not
981
+ provided, text embeddings will be generated from `prompt` input argument.
982
+ negative_prompt_embeds (`torch.FloatTensor`, *optional*):
983
+ Pre-generated negative text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
984
+ weighting. If not provided, negative_prompt_embeds will be generated from `negative_prompt` input
985
+ argument.
986
+ pooled_prompt_embeds (`torch.FloatTensor`, *optional*):
987
+ Pre-generated pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt weighting.
988
+ If not provided, pooled text embeddings will be generated from `prompt` input argument.
989
+ negative_pooled_prompt_embeds (`torch.FloatTensor`, *optional*):
990
+ Pre-generated negative pooled text embeddings. Can be used to easily tweak text inputs, *e.g.* prompt
991
+ weighting. If not provided, pooled negative_prompt_embeds will be generated from `negative_prompt`
992
+ input argument.
993
+ ip_adapter_image: (`PipelineImageInput`, *optional*): Optional image input to work with IP Adapters.
994
+ ip_adapter_image_embeds (`List[torch.FloatTensor]`, *optional*):
995
+ Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of IP-adapters.
996
+ Each element should be a tensor of shape `(batch_size, num_images, emb_dim)`. It should contain the negative image embedding
997
+ if `do_classifier_free_guidance` is set to `True`.
998
+ If not provided, embeddings are computed from the `ip_adapter_image` input argument.
999
+ output_type (`str`, *optional*, defaults to `"pil"`):
1000
+ The output format of the generate image. Choose between
1001
+ [PIL](https://pillow.readthedocs.io/en/stable/): `PIL.Image.Image` or `np.array`.
1002
+ return_dict (`bool`, *optional*, defaults to `True`):
1003
+ Whether or not to return a [`~pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput`] instead
1004
+ of a plain tuple.
1005
+ cross_attention_kwargs (`dict`, *optional*):
1006
+ A kwargs dictionary that if specified is passed along to the `AttentionProcessor` as defined under
1007
+ `self.processor` in
1008
+ [diffusers.models.attention_processor](https://github.com/huggingface/diffusers/blob/main/src/diffusers/models/attention_processor.py).
1009
+ guidance_rescale (`float`, *optional*, defaults to 0.0):
1010
+ Guidance rescale factor proposed by [Common Diffusion Noise Schedules and Sample Steps are
1011
+ Flawed](https://arxiv.org/pdf/2305.08891.pdf) `guidance_scale` is defined as `φ` in equation 16. of
1012
+ [Common Diffusion Noise Schedules and Sample Steps are Flawed](https://arxiv.org/pdf/2305.08891.pdf).
1013
+ Guidance rescale factor should fix overexposure when using zero terminal SNR.
1014
+ original_size (`Tuple[int]`, *optional*, defaults to (1024, 1024)):
1015
+ If `original_size` is not the same as `target_size` the image will appear to be down- or upsampled.
1016
+ `original_size` defaults to `(height, width)` if not specified. Part of SDXL's micro-conditioning as
1017
+ explained in section 2.2 of
1018
+ [https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952).
1019
+ crops_coords_top_left (`Tuple[int]`, *optional*, defaults to (0, 0)):
1020
+ `crops_coords_top_left` can be used to generate an image that appears to be "cropped" from the position
1021
+ `crops_coords_top_left` downwards. Favorable, well-centered images are usually achieved by setting
1022
+ `crops_coords_top_left` to (0, 0). Part of SDXL's micro-conditioning as explained in section 2.2 of
1023
+ [https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952).
1024
+ target_size (`Tuple[int]`, *optional*, defaults to (1024, 1024)):
1025
+ For most cases, `target_size` should be set to the desired height and width of the generated image. If
1026
+ not specified it will default to `(height, width)`. Part of SDXL's micro-conditioning as explained in
1027
+ section 2.2 of [https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952).
1028
+ negative_original_size (`Tuple[int]`, *optional*, defaults to (1024, 1024)):
1029
+ To negatively condition the generation process based on a specific image resolution. Part of SDXL's
1030
+ micro-conditioning as explained in section 2.2 of
1031
+ [https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952). For more
1032
+ information, refer to this issue thread: https://github.com/huggingface/diffusers/issues/4208.
1033
+ negative_crops_coords_top_left (`Tuple[int]`, *optional*, defaults to (0, 0)):
1034
+ To negatively condition the generation process based on a specific crop coordinates. Part of SDXL's
1035
+ micro-conditioning as explained in section 2.2 of
1036
+ [https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952). For more
1037
+ information, refer to this issue thread: https://github.com/huggingface/diffusers/issues/4208.
1038
+ negative_target_size (`Tuple[int]`, *optional*, defaults to (1024, 1024)):
1039
+ To negatively condition the generation process based on a target image resolution. It should be as same
1040
+ as the `target_size` for most cases. Part of SDXL's micro-conditioning as explained in section 2.2 of
1041
+ [https://huggingface.co/papers/2307.01952](https://huggingface.co/papers/2307.01952). For more
1042
+ information, refer to this issue thread: https://github.com/huggingface/diffusers/issues/4208.
1043
+ callback_on_step_end (`Callable`, *optional*):
1044
+ A function that calls at the end of each denoising steps during the inference. The function is called
1045
+ with the following arguments: `callback_on_step_end(self: DiffusionPipeline, step: int, timestep: int,
1046
+ callback_kwargs: Dict)`. `callback_kwargs` will include a list of all tensors as specified by
1047
+ `callback_on_step_end_tensor_inputs`.
1048
+ callback_on_step_end_tensor_inputs (`List`, *optional*):
1049
+ The list of tensor inputs for the `callback_on_step_end` function. The tensors specified in the list
1050
+ will be passed as `callback_kwargs` argument. You will only be able to include variables listed in the
1051
+ `._callback_tensor_inputs` attribute of your pipeline class.
1052
+
1053
+ Examples:
1054
+
1055
+ Returns:
1056
+ [`~pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput`] or `tuple`:
1057
+ [`~pipelines.stable_diffusion_xl.StableDiffusionXLPipelineOutput`] if `return_dict` is True, otherwise a
1058
+ `tuple`. When returning a tuple, the first element is a list with the generated images.
1059
+ """
1060
+
1061
+ callback = kwargs.pop("callback", None)
1062
+ callback_steps = kwargs.pop("callback_steps", None)
1063
+
1064
+ if callback is not None:
1065
+ deprecate(
1066
+ "callback",
1067
+ "1.0.0",
1068
+ "Passing `callback` as an input argument to `__call__` is deprecated, consider use `callback_on_step_end`",
1069
+ )
1070
+ if callback_steps is not None:
1071
+ deprecate(
1072
+ "callback_steps",
1073
+ "1.0.0",
1074
+ "Passing `callback_steps` as an input argument to `__call__` is deprecated, consider use `callback_on_step_end`",
1075
+ )
1076
+
1077
+ # 0. Default height and width to unet
1078
+ height = height or self.default_sample_size * self.vae_scale_factor
1079
+ width = width or self.default_sample_size * self.vae_scale_factor
1080
+
1081
+ original_size = original_size or (height, width)
1082
+ target_size = target_size or (height, width)
1083
+
1084
+ # 1. Check inputs. Raise error if not correct
1085
+ self.check_inputs(
1086
+ prompt,
1087
+ prompt_2,
1088
+ height,
1089
+ width,
1090
+ callback_steps,
1091
+ negative_prompt,
1092
+ negative_prompt_2,
1093
+ prompt_embeds,
1094
+ negative_prompt_embeds,
1095
+ pooled_prompt_embeds,
1096
+ negative_pooled_prompt_embeds,
1097
+ ip_adapter_image,
1098
+ ip_adapter_image_embeds,
1099
+ callback_on_step_end_tensor_inputs,
1100
+ )
1101
+
1102
+ self._guidance_scale = guidance_scale
1103
+ self._guidance_rescale = guidance_rescale
1104
+ self._clip_skip = clip_skip
1105
+ self._cross_attention_kwargs = cross_attention_kwargs
1106
+ self._denoising_end = denoising_end
1107
+ self._interrupt = False
1108
+
1109
+ # 1.1 set unet_controller_parameter
1110
+ if unet_controller is not None:
1111
+ if prompt_2 is not None or negative_prompt_2 is not None:
1112
+ exit("current not support prompt_2 and negatvie_prompt_2")
1113
+ unet_controller.do_classifier_free_guidance = self.do_classifier_free_guidance
1114
+ if not isinstance(prompt, list):
1115
+ prompt_ = [prompt]
1116
+ else:
1117
+ prompt_ = prompt
1118
+ if not isinstance(negative_prompt, list):
1119
+ negative_prompt_ = [negative_prompt]
1120
+ else:
1121
+ negative_prompt_ = negative_prompt
1122
+ unet_controller.prompts = prompt_
1123
+ unet_controller.negative_prompt = negative_prompt_
1124
+
1125
+ # 2. Define call parameters
1126
+ if prompt is not None and isinstance(prompt, str):
1127
+ batch_size = 1
1128
+ elif prompt is not None and isinstance(prompt, list):
1129
+ batch_size = len(prompt)
1130
+ else:
1131
+ batch_size = prompt_embeds.shape[0]
1132
+
1133
+ device = self._execution_device
1134
+
1135
+ # 3. Encode input prompt
1136
+ lora_scale = (
1137
+ self.cross_attention_kwargs.get("scale", None) if self.cross_attention_kwargs is not None else None
1138
+ )
1139
+
1140
+ (
1141
+ prompt_embeds,
1142
+ negative_prompt_embeds,
1143
+ pooled_prompt_embeds,
1144
+ negative_pooled_prompt_embeds,
1145
+ ) = self.encode_prompt(
1146
+ prompt=prompt,
1147
+ prompt_2=prompt_2,
1148
+ device=device,
1149
+ num_images_per_prompt=num_images_per_prompt,
1150
+ do_classifier_free_guidance=self.do_classifier_free_guidance,
1151
+ negative_prompt=negative_prompt,
1152
+ negative_prompt_2=negative_prompt_2,
1153
+ prompt_embeds=prompt_embeds,
1154
+ negative_prompt_embeds=negative_prompt_embeds,
1155
+ pooled_prompt_embeds=pooled_prompt_embeds,
1156
+ negative_pooled_prompt_embeds=negative_pooled_prompt_embeds,
1157
+ lora_scale=lora_scale,
1158
+ clip_skip=self.clip_skip,
1159
+ unet_controller=unet_controller,
1160
+ )
1161
+
1162
+ # 4. Prepare timesteps
1163
+ timesteps, num_inference_steps = retrieve_timesteps(self.scheduler, num_inference_steps, device, timesteps)
1164
+
1165
+ # 5. Prepare latent variables
1166
+ num_channels_latents = self.unet.config.in_channels
1167
+ latents = self.prepare_latents(
1168
+ batch_size * num_images_per_prompt,
1169
+ num_channels_latents,
1170
+ height,
1171
+ width,
1172
+ prompt_embeds.dtype,
1173
+ device,
1174
+ generator,
1175
+ latents,
1176
+ same=(unet_controller.Use_same_latents if unet_controller is not None else False)
1177
+ )
1178
+
1179
+ # 6. Prepare extra step kwargs. TODO: Logic should ideally just be moved out of the pipeline
1180
+ extra_step_kwargs = self.prepare_extra_step_kwargs(generator, eta)
1181
+
1182
+ # 7. Prepare added time ids & embeddings
1183
+ add_text_embeds = pooled_prompt_embeds
1184
+ if self.text_encoder_2 is None:
1185
+ text_encoder_projection_dim = int(pooled_prompt_embeds.shape[-1])
1186
+ else:
1187
+ text_encoder_projection_dim = self.text_encoder_2.config.projection_dim
1188
+
1189
+ add_time_ids = self._get_add_time_ids(
1190
+ original_size,
1191
+ crops_coords_top_left,
1192
+ target_size,
1193
+ dtype=prompt_embeds.dtype,
1194
+ text_encoder_projection_dim=text_encoder_projection_dim,
1195
+ )
1196
+ if negative_original_size is not None and negative_target_size is not None:
1197
+ negative_add_time_ids = self._get_add_time_ids(
1198
+ negative_original_size,
1199
+ negative_crops_coords_top_left,
1200
+ negative_target_size,
1201
+ dtype=prompt_embeds.dtype,
1202
+ text_encoder_projection_dim=text_encoder_projection_dim,
1203
+ )
1204
+ else:
1205
+ negative_add_time_ids = add_time_ids
1206
+
1207
+ if self.do_classifier_free_guidance:
1208
+ prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds], dim=0)
1209
+ add_text_embeds = torch.cat([negative_pooled_prompt_embeds, add_text_embeds], dim=0)
1210
+ add_time_ids = torch.cat([negative_add_time_ids, add_time_ids], dim=0)
1211
+
1212
+ prompt_embeds = prompt_embeds.to(device)
1213
+ add_text_embeds = add_text_embeds.to(device)
1214
+ add_time_ids = add_time_ids.to(device).repeat(batch_size * num_images_per_prompt, 1)
1215
+
1216
+ if ip_adapter_image is not None or ip_adapter_image_embeds is not None:
1217
+ image_embeds = self.prepare_ip_adapter_image_embeds(
1218
+ ip_adapter_image,
1219
+ ip_adapter_image_embeds,
1220
+ device,
1221
+ batch_size * num_images_per_prompt,
1222
+ self.do_classifier_free_guidance,
1223
+ )
1224
+
1225
+ # 8. Denoising loop
1226
+ num_warmup_steps = max(len(timesteps) - num_inference_steps * self.scheduler.order, 0)
1227
+
1228
+ # 8.1 Apply denoising_end
1229
+ if (
1230
+ self.denoising_end is not None
1231
+ and isinstance(self.denoising_end, float)
1232
+ and self.denoising_end > 0
1233
+ and self.denoising_end < 1
1234
+ ):
1235
+ discrete_timestep_cutoff = int(
1236
+ round(
1237
+ self.scheduler.config.num_train_timesteps
1238
+ - (self.denoising_end * self.scheduler.config.num_train_timesteps)
1239
+ )
1240
+ )
1241
+ num_inference_steps = len(list(filter(lambda ts: ts >= discrete_timestep_cutoff, timesteps)))
1242
+ timesteps = timesteps[:num_inference_steps]
1243
+
1244
+ # 9. Optionally get Guidance Scale Embedding
1245
+ timestep_cond = None
1246
+ if self.unet.config.time_cond_proj_dim is not None:
1247
+ guidance_scale_tensor = torch.tensor(self.guidance_scale - 1).repeat(batch_size * num_images_per_prompt)
1248
+ timestep_cond = self.get_guidance_scale_embedding(
1249
+ guidance_scale_tensor, embedding_dim=self.unet.config.time_cond_proj_dim
1250
+ ).to(device=device, dtype=latents.dtype)
1251
+
1252
+ self._num_timesteps = len(timesteps)
1253
+
1254
+ with self.progress_bar(total=num_inference_steps) as progress_bar:
1255
+ for i, t in enumerate(timesteps):
1256
+ if self.interrupt:
1257
+ continue
1258
+
1259
+ if unet_controller is not None:
1260
+ unet_controller.current_time_step = i
1261
+
1262
+ # Expand the latents if we are doing classifier-free guidance
1263
+ latent_model_input = torch.cat([latents] * 2) if self.do_classifier_free_guidance else latents
1264
+ latent_model_input = self.scheduler.scale_model_input(latent_model_input, t)
1265
+
1266
+ # Predict the noise residual
1267
+ added_cond_kwargs = {"text_embeds": add_text_embeds, "time_ids": add_time_ids}
1268
+ if ip_adapter_image is not None or ip_adapter_image_embeds is not None:
1269
+ added_cond_kwargs["image_embeds"] = image_embeds
1270
+
1271
+ noise_pred = self.unet(
1272
+ latent_model_input,
1273
+ t,
1274
+ encoder_hidden_states=prompt_embeds,
1275
+ timestep_cond=timestep_cond,
1276
+ cross_attention_kwargs=self.cross_attention_kwargs,
1277
+ added_cond_kwargs=added_cond_kwargs,
1278
+ unet_controller=unet_controller,
1279
+ return_dict=False,
1280
+ )[0]
1281
+
1282
+ # Perform guidance
1283
+ if self.do_classifier_free_guidance:
1284
+ noise_pred_uncond, noise_pred_text = noise_pred.chunk(2)
1285
+ noise_pred = noise_pred_uncond + self.guidance_scale * (noise_pred_text - noise_pred_uncond)
1286
+
1287
+ if self.do_classifier_free_guidance and self.guidance_rescale > 0.0:
1288
+ # Based on 3.4. in https://arxiv.org/pdf/2305.08891.pdf
1289
+ noise_pred = rescale_noise_cfg(noise_pred, noise_pred_text, guidance_rescale=self.guidance_rescale)
1290
+
1291
+ # Compute the previous noisy sample x_t -> x_t-1
1292
+ latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs, return_dict=False)[0]
1293
+
1294
+ if callback_on_step_end is not None:
1295
+ callback_kwargs = {}
1296
+ for k in callback_on_step_end_tensor_inputs:
1297
+ callback_kwargs[k] = locals()[k]
1298
+ callback_outputs = callback_on_step_end(self, i, t, callback_kwargs)
1299
+
1300
+ latents = callback_outputs.pop("latents", latents)
1301
+ prompt_embeds = callback_outputs.pop("prompt_embeds", prompt_embeds)
1302
+ negative_prompt_embeds = callback_outputs.pop("negative_prompt_embeds", negative_prompt_embeds)
1303
+ add_text_embeds = callback_outputs.pop("add_text_embeds", add_text_embeds)
1304
+ negative_pooled_prompt_embeds = callback_outputs.pop(
1305
+ "negative_pooled_prompt_embeds", negative_pooled_prompt_embeds
1306
+ )
1307
+ add_time_ids = callback_outputs.pop("add_time_ids", add_time_ids)
1308
+ negative_add_time_ids = callback_outputs.pop("negative_add_time_ids", negative_add_time_ids)
1309
+
1310
+ # call the callback, if provided
1311
+ if i == len(timesteps) - 1 or ((i + 1) > num_warmup_steps and (i + 1) % self.scheduler.order == 0):
1312
+ progress_bar.update()
1313
+ if callback is not None and i % callback_steps == 0:
1314
+ step_idx = i // getattr(self.scheduler, "order", 1)
1315
+ callback(step_idx, t, latents)
1316
+
1317
+ # if XLA_AVAILABLE:
1318
+ # xm.mark_step()
1319
+
1320
+ if not output_type == "latent":
1321
+ # make sure the VAE is in float32 mode, as it overflows in float16
1322
+ needs_upcasting = self.vae.dtype == torch.float16 and self.vae.config.force_upcast
1323
+
1324
+ if needs_upcasting:
1325
+ self.upcast_vae()
1326
+ latents = latents.to(next(iter(self.vae.post_quant_conv.parameters())).dtype)
1327
+
1328
+ # unscale/denormalize the latents
1329
+ # denormalize with the mean and std if available and not None
1330
+ has_latents_mean = hasattr(self.vae.config, "latents_mean") and self.vae.config.latents_mean is not None
1331
+ has_latents_std = hasattr(self.vae.config, "latents_std") and self.vae.config.latents_std is not None
1332
+ if has_latents_mean and has_latents_std:
1333
+ latents_mean = (
1334
+ torch.tensor(self.vae.config.latents_mean).view(1, 4, 1, 1).to(latents.device, latents.dtype)
1335
+ )
1336
+ latents_std = (
1337
+ torch.tensor(self.vae.config.latents_std).view(1, 4, 1, 1).to(latents.device, latents.dtype)
1338
+ )
1339
+ latents = latents * latents_std / self.vae.config.scaling_factor + latents_mean
1340
+ else:
1341
+ latents = latents / self.vae.config.scaling_factor
1342
+
1343
+ image = self.vae.decode(latents, return_dict=False)[0]
1344
+
1345
+ # cast back to fp16 if needed
1346
+ if needs_upcasting:
1347
+ self.vae.to(dtype=torch.float16)
1348
+ else:
1349
+ image = latents
1350
+
1351
+ if not output_type == "latent":
1352
+ # apply watermark if available
1353
+ if self.watermark is not None:
1354
+ image = self.watermark.apply_watermark(image)
1355
+
1356
+ image = self.image_processor.postprocess(image, output_type=output_type)
1357
+
1358
+ # Offload all models
1359
+ self.maybe_free_model_hooks()
1360
+
1361
+ if not return_dict:
1362
+ return (image,)
1363
+
1364
+ return StableDiffusionXLPipelineOutput(images=image)
unet/unet.py ADDED
@@ -0,0 +1,599 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # modified from the https://github.com/cloneofsimo/minSDXL
2
+
3
+
4
+ import torch
5
+ import torch.nn as nn
6
+ import torch.nn.functional as F
7
+ import math
8
+ from diffusers.models.modeling_utils import ModelMixin
9
+ from diffusers.configuration_utils import ConfigMixin
10
+ from typing import Optional
11
+
12
+ from unet.unet_controller import UNetController
13
+ import unet.utils as utils
14
+ # SDXL
15
+
16
+
17
+ class Timesteps(nn.Module):
18
+ def __init__(self, num_channels: int = 320):
19
+ super().__init__()
20
+ self.num_channels = num_channels
21
+
22
+ def forward(self, timesteps):
23
+ half_dim = self.num_channels // 2
24
+ exponent = -math.log(10000) * torch.arange(
25
+ half_dim, dtype=torch.float32, device=timesteps.device
26
+ )
27
+ exponent = exponent / (half_dim - 0.0)
28
+
29
+ emb = torch.exp(exponent)
30
+ emb = timesteps[:, None].float() * emb[None, :]
31
+
32
+ sin_emb = torch.sin(emb)
33
+ cos_emb = torch.cos(emb)
34
+ emb = torch.cat([cos_emb, sin_emb], dim=-1)
35
+
36
+ return emb
37
+
38
+
39
+ class TimestepEmbedding(nn.Module):
40
+ def __init__(self, in_features, out_features):
41
+ super(TimestepEmbedding, self).__init__()
42
+ self.linear_1 = nn.Linear(in_features, out_features, bias=True)
43
+ self.act = nn.SiLU()
44
+ self.linear_2 = nn.Linear(out_features, out_features, bias=True)
45
+
46
+ def forward(self, sample):
47
+ sample = self.linear_1(sample)
48
+ sample = self.act(sample)
49
+ sample = self.linear_2(sample)
50
+
51
+ return sample
52
+
53
+
54
+ class ResnetBlock2D(nn.Module):
55
+ def __init__(self, in_channels, out_channels, conv_shortcut=True):
56
+ super(ResnetBlock2D, self).__init__()
57
+ self.norm1 = nn.GroupNorm(32, in_channels, eps=1e-05, affine=True)
58
+ self.conv1 = nn.Conv2d(
59
+ in_channels, out_channels, kernel_size=3, stride=1, padding=1
60
+ )
61
+ self.time_emb_proj = nn.Linear(1280, out_channels, bias=True)
62
+ self.norm2 = nn.GroupNorm(32, out_channels, eps=1e-05, affine=True)
63
+ self.dropout = nn.Dropout(p=0.0, inplace=False)
64
+ self.conv2 = nn.Conv2d(
65
+ out_channels, out_channels, kernel_size=3, stride=1, padding=1
66
+ )
67
+ self.nonlinearity = nn.SiLU()
68
+ self.conv_shortcut = None
69
+ if conv_shortcut:
70
+ self.conv_shortcut = nn.Conv2d(
71
+ in_channels, out_channels, kernel_size=1, stride=1
72
+ )
73
+
74
+ def forward(self, input_tensor, temb):
75
+ hidden_states = input_tensor
76
+ hidden_states = self.norm1(hidden_states)
77
+ hidden_states = self.nonlinearity(hidden_states)
78
+
79
+ hidden_states = self.conv1(hidden_states)
80
+
81
+ temb = self.nonlinearity(temb)
82
+ temb = self.time_emb_proj(temb)[:, :, None, None]
83
+ hidden_states = hidden_states + temb
84
+ hidden_states = self.norm2(hidden_states)
85
+
86
+ hidden_states = self.nonlinearity(hidden_states)
87
+ hidden_states = self.dropout(hidden_states)
88
+ hidden_states = self.conv2(hidden_states)
89
+
90
+ if self.conv_shortcut is not None:
91
+ input_tensor = self.conv_shortcut(input_tensor)
92
+
93
+ output_tensor = input_tensor + hidden_states
94
+
95
+ return output_tensor
96
+
97
+
98
+ class Attention(nn.Module):
99
+ def __init__(
100
+ self, inner_dim, cross_attention_dim=None, num_heads=None, dropout=0.0
101
+ ):
102
+ super(Attention, self).__init__()
103
+ if num_heads is None:
104
+ self.head_dim = 64
105
+ self.num_heads = inner_dim // self.head_dim
106
+ else:
107
+ self.num_heads = num_heads
108
+ self.head_dim = inner_dim // num_heads
109
+
110
+ self.scale = self.head_dim**-0.5
111
+ if cross_attention_dim is None:
112
+ cross_attention_dim = inner_dim
113
+ self.to_q = nn.Linear(inner_dim, inner_dim, bias=False)
114
+ self.to_k = nn.Linear(cross_attention_dim, inner_dim, bias=False)
115
+ self.to_v = nn.Linear(cross_attention_dim, inner_dim, bias=False)
116
+
117
+ self.to_out = nn.ModuleList(
118
+ [nn.Linear(inner_dim, inner_dim), nn.Dropout(dropout, inplace=False)]
119
+ )
120
+
121
+ def forward(self, hidden_states, encoder_hidden_states=None, unet_controller: Optional[UNetController] = None):
122
+ q = self.to_q(hidden_states)
123
+ k = (
124
+ self.to_k(encoder_hidden_states)
125
+ if encoder_hidden_states is not None
126
+ else self.to_k(hidden_states)
127
+ )
128
+ v = (
129
+ self.to_v(encoder_hidden_states)
130
+ if encoder_hidden_states is not None
131
+ else self.to_v(hidden_states)
132
+ )
133
+ b, t, c = q.size()
134
+
135
+ q = q.view(q.size(0), q.size(1), self.num_heads, self.head_dim).transpose(1, 2)
136
+ k = k.view(k.size(0), k.size(1), self.num_heads, self.head_dim).transpose(1, 2)
137
+ v = v.view(v.size(0), v.size(1), self.num_heads, self.head_dim).transpose(1, 2)
138
+
139
+
140
+ if (unet_controller is not None and unet_controller.Use_ipca and unet_controller.current_unet_position in unet_controller.Ipca_position
141
+ and encoder_hidden_states is not None and unet_controller.current_time_step >= unet_controller.Ipca_start_step):
142
+
143
+ if unet_controller.do_classifier_free_guidance is True:
144
+ scores = torch.matmul(q, k.transpose(-2, -1)) * self.scale
145
+ attn_weights = torch.softmax(scores, dim=-1) # this is only used by cross_attn_map store
146
+ ipca_attn_output = utils.ipca2(q,k,v,self.scale,unet_controller=unet_controller)
147
+ attn_output = ipca_attn_output
148
+ else:
149
+ exit("current doesn't support cfg=1.0")
150
+
151
+ else:
152
+ scores = torch.matmul(q, k.transpose(-2, -1)) * self.scale
153
+ attn_weights = torch.softmax(scores, dim=-1)
154
+ attn_output = torch.matmul(attn_weights, v)
155
+
156
+ attn_output = attn_output.transpose(1, 2).contiguous().view(b, t, c)
157
+
158
+ for layer in self.to_out:
159
+ attn_output = layer(attn_output)
160
+
161
+ return attn_output
162
+
163
+
164
+ class GEGLU(nn.Module):
165
+ def __init__(self, in_features, out_features):
166
+ super(GEGLU, self).__init__()
167
+ self.proj = nn.Linear(in_features, out_features * 2, bias=True)
168
+
169
+ def forward(self, x):
170
+ x_proj = self.proj(x)
171
+ x1, x2 = x_proj.chunk(2, dim=-1)
172
+ return x1 * torch.nn.functional.gelu(x2)
173
+
174
+
175
+ class FeedForward(nn.Module):
176
+ def __init__(self, in_features, out_features):
177
+ super(FeedForward, self).__init__()
178
+
179
+ self.net = nn.ModuleList(
180
+ [
181
+ GEGLU(in_features, out_features * 4),
182
+ nn.Dropout(p=0.0, inplace=False),
183
+ nn.Linear(out_features * 4, out_features, bias=True),
184
+ ]
185
+ )
186
+
187
+ def forward(self, x):
188
+ for layer in self.net:
189
+ x = layer(x)
190
+ return x
191
+
192
+
193
+ class BasicTransformerBlock(nn.Module):
194
+ def __init__(self, hidden_size):
195
+ super(BasicTransformerBlock, self).__init__()
196
+ self.norm1 = nn.LayerNorm(hidden_size, eps=1e-05, elementwise_affine=True)
197
+ self.attn1 = Attention(hidden_size)
198
+ self.norm2 = nn.LayerNorm(hidden_size, eps=1e-05, elementwise_affine=True)
199
+ self.attn2 = Attention(hidden_size, 2048)
200
+ self.norm3 = nn.LayerNorm(hidden_size, eps=1e-05, elementwise_affine=True)
201
+ self.ff = FeedForward(hidden_size, hidden_size)
202
+
203
+ def forward(self, x, encoder_hidden_states=None, unet_controller: Optional[UNetController] = None):
204
+ residual = x
205
+
206
+ x = self.norm1(x)
207
+ x = self.attn1(x, unet_controller=unet_controller,)
208
+ x = x + residual
209
+
210
+ residual = x
211
+
212
+ x = self.norm2(x)
213
+ if encoder_hidden_states is not None:
214
+ x = self.attn2(x, encoder_hidden_states, unet_controller=unet_controller,)
215
+ else:
216
+ x = self.attn2(x, unet_controller=unet_controller,)
217
+ x = x + residual
218
+
219
+ residual = x
220
+
221
+ x = self.norm3(x)
222
+ x = self.ff(x)
223
+ x = x + residual
224
+ return x
225
+
226
+
227
+ class Transformer2DModel(nn.Module):
228
+ def __init__(self, in_channels, out_channels, n_layers):
229
+ super(Transformer2DModel, self).__init__()
230
+ self.norm = nn.GroupNorm(32, in_channels, eps=1e-06, affine=True)
231
+ self.proj_in = nn.Linear(in_channels, out_channels, bias=True)
232
+ self.transformer_blocks = nn.ModuleList(
233
+ [BasicTransformerBlock(out_channels) for _ in range(n_layers)]
234
+ )
235
+ self.proj_out = nn.Linear(out_channels, out_channels, bias=True)
236
+
237
+ def forward(self, hidden_states, encoder_hidden_states=None, unet_controller: Optional[UNetController] = None):
238
+ batch, _, height, width = hidden_states.shape
239
+ res = hidden_states
240
+ hidden_states = self.norm(hidden_states)
241
+ inner_dim = hidden_states.shape[1]
242
+ hidden_states = hidden_states.permute(0, 2, 3, 1).reshape(
243
+ batch, height * width, inner_dim
244
+ )
245
+ hidden_states = self.proj_in(hidden_states)
246
+
247
+ for block in self.transformer_blocks:
248
+ hidden_states = block(hidden_states, encoder_hidden_states, unet_controller=unet_controller,)
249
+
250
+ hidden_states = self.proj_out(hidden_states)
251
+ hidden_states = (
252
+ hidden_states.reshape(batch, height, width, inner_dim)
253
+ .permute(0, 3, 1, 2)
254
+ .contiguous()
255
+ )
256
+
257
+ return hidden_states + res
258
+
259
+
260
+ class Downsample2D(nn.Module):
261
+ def __init__(self, in_channels, out_channels):
262
+ super(Downsample2D, self).__init__()
263
+ self.conv = nn.Conv2d(
264
+ in_channels, out_channels, kernel_size=3, stride=2, padding=1
265
+ )
266
+
267
+ def forward(self, x):
268
+ return self.conv(x)
269
+
270
+
271
+ class Upsample2D(nn.Module):
272
+ def __init__(self, in_channels, out_channels):
273
+ super(Upsample2D, self).__init__()
274
+ self.conv = nn.Conv2d(
275
+ in_channels, out_channels, kernel_size=3, stride=1, padding=1
276
+ )
277
+
278
+ def forward(self, x):
279
+ x = F.interpolate(x, scale_factor=2.0, mode="nearest")
280
+ return self.conv(x)
281
+
282
+
283
+ class DownBlock2D(nn.Module):
284
+ def __init__(self, in_channels, out_channels):
285
+ super(DownBlock2D, self).__init__()
286
+ self.resnets = nn.ModuleList(
287
+ [
288
+ ResnetBlock2D(in_channels, out_channels, conv_shortcut=False),
289
+ ResnetBlock2D(out_channels, out_channels, conv_shortcut=False),
290
+ ]
291
+ )
292
+ self.downsamplers = nn.ModuleList([Downsample2D(out_channels, out_channels)])
293
+
294
+ def forward(self, hidden_states, temb):
295
+ output_states = []
296
+ for module in self.resnets:
297
+ hidden_states = module(hidden_states, temb)
298
+ output_states.append(hidden_states)
299
+
300
+ hidden_states = self.downsamplers[0](hidden_states)
301
+ output_states.append(hidden_states)
302
+
303
+ return hidden_states, output_states
304
+
305
+
306
+ class CrossAttnDownBlock2D(nn.Module):
307
+ def __init__(self, in_channels, out_channels, n_layers, has_downsamplers=True):
308
+ super(CrossAttnDownBlock2D, self).__init__()
309
+ self.attentions = nn.ModuleList(
310
+ [
311
+ Transformer2DModel(out_channels, out_channels, n_layers),
312
+ Transformer2DModel(out_channels, out_channels, n_layers),
313
+ ]
314
+ )
315
+ self.resnets = nn.ModuleList(
316
+ [
317
+ ResnetBlock2D(in_channels, out_channels),
318
+ ResnetBlock2D(out_channels, out_channels, conv_shortcut=False),
319
+ ]
320
+ )
321
+ self.downsamplers = None
322
+ if has_downsamplers:
323
+ self.downsamplers = nn.ModuleList(
324
+ [Downsample2D(out_channels, out_channels)]
325
+ )
326
+
327
+ def forward(self, hidden_states, temb, encoder_hidden_states, unet_controller: Optional[UNetController] = None):
328
+ output_states = []
329
+ for resnet, attn in zip(self.resnets, self.attentions):
330
+ hidden_states = resnet(hidden_states, temb)
331
+ hidden_states = attn(
332
+ hidden_states,
333
+ encoder_hidden_states=encoder_hidden_states,
334
+ unet_controller=unet_controller,
335
+ )
336
+ output_states.append(hidden_states)
337
+
338
+ if self.downsamplers is not None:
339
+ hidden_states = self.downsamplers[0](hidden_states)
340
+ output_states.append(hidden_states)
341
+
342
+ return hidden_states, output_states
343
+
344
+
345
+ class CrossAttnUpBlock2D(nn.Module):
346
+ def __init__(self, in_channels, out_channels, prev_output_channel, n_layers):
347
+ super(CrossAttnUpBlock2D, self).__init__()
348
+ self.attentions = nn.ModuleList(
349
+ [
350
+ Transformer2DModel(out_channels, out_channels, n_layers),
351
+ Transformer2DModel(out_channels, out_channels, n_layers),
352
+ Transformer2DModel(out_channels, out_channels, n_layers),
353
+ ]
354
+ )
355
+ self.resnets = nn.ModuleList(
356
+ [
357
+ ResnetBlock2D(prev_output_channel + out_channels, out_channels),
358
+ ResnetBlock2D(2 * out_channels, out_channels),
359
+ ResnetBlock2D(out_channels + in_channels, out_channels),
360
+ ]
361
+ )
362
+ self.upsamplers = nn.ModuleList([Upsample2D(out_channels, out_channels)])
363
+
364
+ def forward(
365
+ self, hidden_states, res_hidden_states_tuple, temb, encoder_hidden_states, unet_controller: Optional[UNetController] = None,
366
+ ):
367
+ for resnet, attn in zip(self.resnets, self.attentions):
368
+ # pop res hidden states
369
+ res_hidden_states = res_hidden_states_tuple[-1]
370
+ res_hidden_states_tuple = res_hidden_states_tuple[:-1]
371
+
372
+ if unet_controller is not None and unet_controller.Is_freeu_enabled:
373
+ hidden_states, res_hidden_states = utils.apply_freeu(
374
+ 0 if unet_controller.current_unet_position == 'up0' else 1,
375
+ hidden_states,
376
+ res_hidden_states,
377
+ s1=unet_controller.Freeu_parm['s1'],
378
+ s2=unet_controller.Freeu_parm['s2'],
379
+ b1=unet_controller.Freeu_parm['b1'],
380
+ b2=unet_controller.Freeu_parm['b2'],
381
+ )
382
+
383
+ hidden_states = torch.cat([hidden_states, res_hidden_states], dim=1)
384
+ hidden_states = resnet(hidden_states, temb)
385
+ hidden_states = attn(
386
+ hidden_states,
387
+ encoder_hidden_states=encoder_hidden_states,
388
+ unet_controller=unet_controller,
389
+ )
390
+
391
+ if self.upsamplers is not None:
392
+ for upsampler in self.upsamplers:
393
+ hidden_states = upsampler(hidden_states)
394
+
395
+ return hidden_states
396
+
397
+
398
+ class UpBlock2D(nn.Module):
399
+ def __init__(self, in_channels, out_channels, prev_output_channel):
400
+ super(UpBlock2D, self).__init__()
401
+ self.resnets = nn.ModuleList(
402
+ [
403
+ ResnetBlock2D(out_channels + prev_output_channel, out_channels),
404
+ ResnetBlock2D(out_channels * 2, out_channels),
405
+ ResnetBlock2D(out_channels + in_channels, out_channels),
406
+ ]
407
+ )
408
+
409
+ def forward(self, hidden_states, res_hidden_states_tuple, temb=None):
410
+ for resnet in self.resnets:
411
+ res_hidden_states = res_hidden_states_tuple[-1]
412
+ res_hidden_states_tuple = res_hidden_states_tuple[:-1]
413
+ hidden_states = torch.cat([hidden_states, res_hidden_states], dim=1)
414
+ hidden_states = resnet(hidden_states, temb)
415
+
416
+ return hidden_states
417
+
418
+
419
+ class UNetMidBlock2DCrossAttn(nn.Module):
420
+ def __init__(self, in_features):
421
+ super(UNetMidBlock2DCrossAttn, self).__init__()
422
+ self.attentions = nn.ModuleList(
423
+ [Transformer2DModel(in_features, in_features, n_layers=10)]
424
+ )
425
+ self.resnets = nn.ModuleList(
426
+ [
427
+ ResnetBlock2D(in_features, in_features, conv_shortcut=False),
428
+ ResnetBlock2D(in_features, in_features, conv_shortcut=False),
429
+ ]
430
+ )
431
+
432
+ def forward(self, hidden_states, temb=None, encoder_hidden_states=None, unet_controller: Optional[UNetController] = None):
433
+ hidden_states = self.resnets[0](hidden_states, temb)
434
+ for attn, resnet in zip(self.attentions, self.resnets[1:]):
435
+ hidden_states = attn(
436
+ hidden_states,
437
+ encoder_hidden_states=encoder_hidden_states,
438
+ unet_controller=unet_controller,
439
+ )
440
+ hidden_states = resnet(hidden_states, temb)
441
+
442
+ return hidden_states
443
+
444
+
445
+ class UNet2DConditionModel(ModelMixin, ConfigMixin):
446
+ def __init__(self):
447
+ super(UNet2DConditionModel, self).__init__() ## init child class first
448
+
449
+ # This is needed to imitate huggingface config behavior
450
+ # has nothing to do with the model itself
451
+ # remove this if you don't use diffuser's pipeline
452
+ # self.config = namedtuple(
453
+ # "config", "in_channels addition_time_embed_dim sample_size time_cond_proj_dim"
454
+ # )
455
+ # self.config.in_channels = 4
456
+ # self.config.addition_time_embed_dim = 256
457
+ # self.config.sample_size = 128
458
+ # self.config.time_cond_proj_dim = None
459
+
460
+ self.conv_in = nn.Conv2d(4, 320, kernel_size=3, stride=1, padding=1)
461
+ self.time_proj = Timesteps()
462
+ self.time_embedding = TimestepEmbedding(in_features=320, out_features=1280)
463
+ self.add_time_proj = Timesteps(256)
464
+ self.add_embedding = TimestepEmbedding(in_features=2816, out_features=1280)
465
+ self.down_blocks = nn.ModuleList(
466
+ [
467
+ DownBlock2D(in_channels=320, out_channels=320),
468
+ CrossAttnDownBlock2D(in_channels=320, out_channels=640, n_layers=2),
469
+ CrossAttnDownBlock2D(
470
+ in_channels=640,
471
+ out_channels=1280,
472
+ n_layers=10,
473
+ has_downsamplers=False,
474
+ ),
475
+ ]
476
+ )
477
+ self.up_blocks = nn.ModuleList(
478
+ [
479
+ CrossAttnUpBlock2D(
480
+ in_channels=640,
481
+ out_channels=1280,
482
+ prev_output_channel=1280,
483
+ n_layers=10,
484
+ ),
485
+ CrossAttnUpBlock2D(
486
+ in_channels=320,
487
+ out_channels=640,
488
+ prev_output_channel=1280,
489
+ n_layers=2,
490
+ ),
491
+ UpBlock2D(in_channels=320, out_channels=320, prev_output_channel=640),
492
+ ]
493
+ )
494
+ self.mid_block = UNetMidBlock2DCrossAttn(1280)
495
+ self.conv_norm_out = nn.GroupNorm(32, 320, eps=1e-05, affine=True)
496
+ self.conv_act = nn.SiLU()
497
+ self.conv_out = nn.Conv2d(320, 4, kernel_size=3, stride=1, padding=1)
498
+
499
+
500
+ def forward(
501
+ self, sample, timesteps, encoder_hidden_states, added_cond_kwargs, unet_controller: Optional[UNetController] = None, **kwargs
502
+ ):
503
+ # Implement the forward pass through the model
504
+ timesteps = timesteps.expand(sample.shape[0])
505
+ t_emb = self.time_proj(timesteps).to(dtype=sample.dtype)
506
+
507
+ emb = self.time_embedding(t_emb)
508
+
509
+ text_embeds = added_cond_kwargs.get("text_embeds")
510
+ time_ids = added_cond_kwargs.get("time_ids")
511
+
512
+ time_embeds = self.add_time_proj(time_ids.flatten())
513
+ time_embeds = time_embeds.reshape((text_embeds.shape[0], -1))
514
+
515
+ add_embeds = torch.concat([text_embeds, time_embeds], dim=-1)
516
+ add_embeds = add_embeds.to(emb.dtype)
517
+ aug_emb = self.add_embedding(add_embeds)
518
+
519
+ emb = emb + aug_emb
520
+
521
+ sample = self.conv_in(sample)
522
+
523
+ # 3. down
524
+ if unet_controller is not None:
525
+ unet_controller.current_unet_position = 'down0'
526
+
527
+ s0 = sample
528
+ sample, [s1, s2, s3] = self.down_blocks[0](
529
+ sample,
530
+ temb=emb,
531
+ )
532
+
533
+ if unet_controller is not None:
534
+ unet_controller.current_unet_position = 'down1'
535
+
536
+ # encoder_hidden_states is prompt_embedings, so here do cross_attn
537
+ sample, [s4, s5, s6] = self.down_blocks[1](
538
+ sample,
539
+ temb=emb, # time_embbeding
540
+ encoder_hidden_states=encoder_hidden_states, #[2,77,2048], 2 means two branch, 1 for prompt, 1 for negative prompt
541
+ unet_controller=unet_controller,
542
+ )
543
+
544
+ if unet_controller is not None:
545
+ unet_controller.current_unet_position = 'down2'
546
+
547
+ sample, [s7, s8] = self.down_blocks[2](
548
+ sample,
549
+ temb=emb,
550
+ encoder_hidden_states=encoder_hidden_states,
551
+ unet_controller=unet_controller,
552
+ )
553
+
554
+ # 4. mid
555
+ if unet_controller is not None:
556
+ unet_controller.current_unet_position = 'mid'
557
+
558
+ sample = self.mid_block(
559
+ sample, emb, encoder_hidden_states=encoder_hidden_states, unet_controller=unet_controller,
560
+ )
561
+
562
+ # 5. up
563
+ if unet_controller is not None:
564
+ unet_controller.current_unet_position = 'up0'
565
+
566
+ sample = self.up_blocks[0](
567
+ hidden_states=sample,
568
+ temb=emb,
569
+ res_hidden_states_tuple=[s6, s7, s8],
570
+ encoder_hidden_states=encoder_hidden_states,
571
+ unet_controller=unet_controller,
572
+ )
573
+
574
+ if unet_controller is not None:
575
+ unet_controller.current_unet_position = 'up1'
576
+
577
+ sample = self.up_blocks[1](
578
+ hidden_states=sample,
579
+ temb=emb,
580
+ res_hidden_states_tuple=[s3, s4, s5],
581
+ encoder_hidden_states=encoder_hidden_states,
582
+ unet_controller=unet_controller,
583
+ )
584
+
585
+ if unet_controller is not None:
586
+ unet_controller.current_unet_position = 'up2'
587
+
588
+ sample = self.up_blocks[2](
589
+ hidden_states=sample,
590
+ temb=emb,
591
+ res_hidden_states_tuple=[s0, s1, s2],
592
+ )
593
+
594
+ # 6. post-process
595
+ sample = self.conv_norm_out(sample)
596
+ sample = self.conv_act(sample)
597
+ sample = self.conv_out(sample)
598
+
599
+ return [sample]
unet/unet_controller.py ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+
3
+
4
+ class UNetController():
5
+ # Static variables (Hyperparameters)
6
+ Is_freeu_enabled = False
7
+ Freeu_parm = {'s1': 0.6, 's2': 0.4, 'b1': 1.1, 'b2': 1.2}
8
+
9
+ # Ipca parameters
10
+ Use_ipca = True
11
+ Ipca_position = ['down0', 'down1', 'down2', 'mid', 'up0', 'up1', 'up2']
12
+ Ipca_start_step = 0
13
+ Ipca_dropout = 0.0
14
+ Use_embeds_mask = True
15
+
16
+ # SVR parameters
17
+ Alpha_weaken = 0.01 # 0.01~0.5
18
+ Beta_weaken = 0.05 # 0.05~1.0
19
+ Alpha_enhance = -0.01 # -0.001~-0.02
20
+ Beta_enhance = 1.0 # 1.0~2.0
21
+
22
+ # SVR settings
23
+ Prompt_embeds_mode = 'svr'
24
+ Remove_pool_embeds = False
25
+ Prompt_embeds_start_step = 0
26
+
27
+ Store_qkv = True
28
+
29
+ # other settings
30
+ Use_same_latents = True
31
+ Use_same_init_noise = True
32
+ Save_story_image = True
33
+
34
+ def __init__(self):
35
+ self._variables = {}
36
+
37
+ ## Variables (updated during inference) ##
38
+ self.device = "cuda"
39
+ self.current_unet_position = 'down' # down, mid or up
40
+ self.torch_dtype = torch.float16
41
+
42
+ self.prompts = None
43
+ self.negative_prompt = None
44
+ self.id_prompt = None
45
+ self.frame_prompt_express = None
46
+ self.frame_prompt_suppress = None
47
+
48
+ self.frame_prompt_express_list = None
49
+ self.frame_prompt_suppress_list = None
50
+
51
+ self.tokenizer = None
52
+ self.result_save_dir = None
53
+ self.current_time_step = None
54
+ self.do_classifier_free_guidance = None
55
+
56
+ self.q_store = {}
57
+ self.k_store = {}
58
+ self.v_store = {}
59
+
60
+ self.do_classifier_free_guidance = None
61
+ self.current_unet_position = None
62
+
63
+ self.ipca2_index = -1
64
+ self.ipca_time_step = -1
65
+ ## Variables End ##
66
+
67
+
68
+ def print_attributes(self):
69
+ """
70
+ Prints all attributes and their values of the object.
71
+ """
72
+ for attr, value in vars(self).items():
73
+ print(f"{attr}: {value}")
unet/utils.py ADDED
@@ -0,0 +1,483 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import torch
2
+ from typing import Optional
3
+ from PIL import Image
4
+ from diffusers import AutoencoderKL, EulerDiscreteScheduler, EDMDPMSolverMultistepScheduler
5
+ from transformers import (
6
+ CLIPTextModel,
7
+ CLIPTextModelWithProjection,
8
+ CLIPTokenizer,
9
+ )
10
+ from scipy.spatial.distance import cdist
11
+ import numpy as np
12
+ import unet.pipeline_stable_diffusion_xl as pipeline_stable_diffusion_xl
13
+ from torch.fft import fftn, fftshift, ifftn, ifftshift
14
+ from typing import Optional, Tuple
15
+
16
+ from unet.unet import UNet2DConditionModel
17
+ from unet.unet_controller import UNetController
18
+
19
+
20
+ def ipca(q, k, v, scale, unet_controller: Optional[UNetController] = None): # eg. q: [4,20,1024,64] k,v: [4,20,77,64]
21
+ q_neg, q_pos = torch.split(q, q.size(0) // 2, dim=0)
22
+ k_neg, k_pos = torch.split(k, k.size(0) // 2, dim=0)
23
+ v_neg, v_pos = torch.split(v, v.size(0) // 2, dim=0)
24
+
25
+ # 1. negative_attn
26
+
27
+ scores_neg = torch.matmul(q_neg, k_neg.transpose(-2, -1)) * scale
28
+ attn_weights_neg = torch.softmax(scores_neg, dim=-1)
29
+ attn_output_neg = torch.matmul(attn_weights_neg, v_neg)
30
+
31
+ # 2. positive_attn (we do ipca only on positive branch)
32
+
33
+ # 2.1 ipca
34
+ k_plus = torch.cat(tuple(k_pos.transpose(-2, -1)), dim=2).unsqueeze(0).repeat(k_pos.size(0),1,1,1) # 𝐾+ = [𝐾1 ⊕ 𝐾2 ⊕ . . . ⊕ 𝐾𝑁 ]
35
+ v_plus = torch.cat(tuple(v_pos), dim=1).unsqueeze(0).repeat(v_pos.size(0),1,1,1) # 𝑉+ = [𝑉1 ⊕ 𝑉2 ⊕ . . . ⊕ 𝑉𝑁 ]
36
+
37
+
38
+ # 2.2 apply mask
39
+ if unet_controller is not None:
40
+ scores_pos = torch.matmul(q_pos, k_plus) * scale
41
+
42
+
43
+ # 2.2.1 apply dropout mask
44
+ dropout_mask = gen_dropout_mask(scores_pos.shape, unet_controller, unet_controller.Ipca_dropout) # eg: [a,1024,154]
45
+
46
+
47
+ # 2.2.3 apply embeds mask
48
+ if unet_controller.Use_embeds_mask:
49
+ apply_embeds_mask(unet_controller,dropout_mask, add_eot=False)
50
+
51
+ mask = dropout_mask
52
+
53
+ mask = mask.unsqueeze(1).repeat(1,scores_pos.size(1),1,1)
54
+ attn_weights_pos = torch.softmax(scores_pos + torch.log(mask), dim=-1)
55
+
56
+ else:
57
+ scores_pos = torch.matmul(q_pos, k_plus) * scale
58
+ attn_weights_pos = torch.softmax(scores_pos, dim=-1)
59
+
60
+
61
+ attn_output_pos = torch.matmul(attn_weights_pos, v_plus)
62
+ # 3. combine
63
+ attn_output = torch.cat((attn_output_neg, attn_output_pos), dim=0)
64
+
65
+ return attn_output
66
+
67
+
68
+ def ipca2(q, k, v, scale, unet_controller: Optional[UNetController] = None): # eg. q: [4,20,1024,64] k,v: [4,20,77,64]
69
+ if unet_controller.ipca_time_step != unet_controller.current_time_step:
70
+ unet_controller.ipca_time_step = unet_controller.current_time_step
71
+ unet_controller.ipca2_index = 0
72
+ else:
73
+ unet_controller.ipca2_index += 1
74
+
75
+ if unet_controller.Store_qkv is True:
76
+
77
+ key = f"cross {unet_controller.current_time_step} {unet_controller.current_unet_position} {unet_controller.ipca2_index}"
78
+ unet_controller.k_store[key] = k
79
+ unet_controller.v_store[key] = v
80
+
81
+ scores = torch.matmul(q, k.transpose(-2, -1)) * scale
82
+ attn_weights = torch.softmax(scores, dim=-1)
83
+ attn_output = torch.matmul(attn_weights, v)
84
+ else:
85
+ # batch > 1
86
+ if unet_controller.frame_prompt_express_list is not None:
87
+ batch_size = q.size(0) // 2
88
+ attn_output_list = []
89
+
90
+ for i in range(batch_size):
91
+ q_i = q[[i, i + batch_size], :, :, :]
92
+ k_i = k[[i, i + batch_size], :, :, :]
93
+ v_i = v[[i, i + batch_size], :, :, :]
94
+
95
+ q_neg_i, q_pos_i = torch.split(q_i, q_i.size(0) // 2, dim=0)
96
+ k_neg_i, k_pos_i = torch.split(k_i, k_i.size(0) // 2, dim=0)
97
+ v_neg_i, v_pos_i = torch.split(v_i, v_i.size(0) // 2, dim=0)
98
+
99
+ key = f"cross {unet_controller.current_time_step} {unet_controller.current_unet_position} {unet_controller.ipca2_index}"
100
+ q_store = q_i
101
+ k_store = unet_controller.k_store[key]
102
+ v_store = unet_controller.v_store[key]
103
+
104
+ q_store_neg, q_store_pos = torch.split(q_store, q_store.size(0) // 2, dim=0)
105
+ k_store_neg, k_store_pos = torch.split(k_store, k_store.size(0) // 2, dim=0)
106
+ v_store_neg, v_store_pos = torch.split(v_store, v_store.size(0) // 2, dim=0)
107
+
108
+ q_neg = torch.cat((q_neg_i, q_store_neg), dim=0)
109
+ q_pos = torch.cat((q_pos_i, q_store_pos), dim=0)
110
+ k_neg = torch.cat((k_neg_i, k_store_neg), dim=0)
111
+ k_pos = torch.cat((k_pos_i, k_store_pos), dim=0)
112
+ v_neg = torch.cat((v_neg_i, v_store_neg), dim=0)
113
+ v_pos = torch.cat((v_pos_i, v_store_pos), dim=0)
114
+
115
+ q_i = torch.cat((q_neg, q_pos), dim=0)
116
+ k_i = torch.cat((k_neg, k_pos), dim=0)
117
+ v_i = torch.cat((v_neg, v_pos), dim=0)
118
+
119
+ attn_output_i = ipca(q_i, k_i, v_i, scale, unet_controller)
120
+ attn_output_i = attn_output_i[[0, 2], :, :, :]
121
+ attn_output_list.append(attn_output_i)
122
+
123
+ attn_output_ = torch.cat(attn_output_list, dim=0)
124
+ attn_output = torch.zeros(size=(q.size(0), attn_output_i.size(1), attn_output_i.size(2), attn_output_i.size(3)), device=q.device, dtype=q.dtype)
125
+ for i in range(batch_size):
126
+ attn_output[i] = attn_output_[i*2]
127
+ for i in range(batch_size):
128
+ attn_output[i + batch_size] = attn_output_[i*2 + 1]
129
+ # batch = 1
130
+ else:
131
+ q_neg, q_pos = torch.split(q, q.size(0) // 2, dim=0)
132
+ k_neg, k_pos = torch.split(k, k.size(0) // 2, dim=0)
133
+ v_neg, v_pos = torch.split(v, v.size(0) // 2, dim=0)
134
+
135
+ key = f"cross {unet_controller.current_time_step} {unet_controller.current_unet_position} {unet_controller.ipca2_index}"
136
+ q_store = q
137
+ k_store = unet_controller.k_store[key]
138
+ v_store = unet_controller.v_store[key]
139
+
140
+ q_store_neg, q_store_pos = torch.split(q_store, q_store.size(0) // 2, dim=0)
141
+ k_store_neg, k_store_pos = torch.split(k_store, k_store.size(0) // 2, dim=0)
142
+ v_store_neg, v_store_pos = torch.split(v_store, v_store.size(0) // 2, dim=0)
143
+
144
+ q_neg = torch.cat((q_neg, q_store_neg), dim=0)
145
+ q_pos = torch.cat((q_pos, q_store_pos), dim=0)
146
+ k_neg = torch.cat((k_neg, k_store_neg), dim=0)
147
+ k_pos = torch.cat((k_pos, k_store_pos), dim=0)
148
+ v_neg = torch.cat((v_neg, v_store_neg), dim=0)
149
+ v_pos = torch.cat((v_pos, v_store_pos), dim=0)
150
+
151
+ q = torch.cat((q_neg, q_pos), dim=0)
152
+ k = torch.cat((k_neg, k_pos), dim=0)
153
+ v = torch.cat((v_neg, v_pos), dim=0)
154
+
155
+ attn_output = ipca(q, k, v, scale, unet_controller)
156
+ attn_output = attn_output[[0, 2], :, :, :]
157
+
158
+ return attn_output
159
+
160
+
161
+ def apply_embeds_mask(unet_controller: Optional[UNetController],dropout_mask, add_eot=False):
162
+ id_prompt = unet_controller.id_prompt
163
+ prompt_tokens = prompt2tokens(unet_controller.tokenizer,unet_controller.prompts[0])
164
+
165
+ words_tokens = prompt2tokens(unet_controller.tokenizer,id_prompt)
166
+ words_tokens = [word for word in words_tokens if word != '<|endoftext|>' and word != '<|startoftext|>']
167
+ index_of_words = find_sublist_index(prompt_tokens,words_tokens)
168
+ index_list = [index+77 for index in range(index_of_words, index_of_words+len(words_tokens))]
169
+ if add_eot:
170
+ index_list.extend([index+77 for index, word in enumerate(prompt_tokens) if word == '<|endoftext|>'])
171
+
172
+ mask_indices = torch.arange(dropout_mask.size(-1), device=dropout_mask.device)
173
+ mask = (mask_indices >= 78) & (~torch.isin(mask_indices, torch.tensor(index_list, device=dropout_mask.device)))
174
+ dropout_mask[0, :, mask] = 0
175
+
176
+
177
+ def gen_dropout_mask(out_shape, unet_controller: Optional[UNetController], drop_out):
178
+ gen_length = out_shape[3]
179
+ attn_map_side_length = out_shape[2]
180
+
181
+ batch_num = out_shape[0]
182
+ mask_list = []
183
+
184
+ for prompt_index in range(batch_num):
185
+ start = prompt_index * int(gen_length / batch_num)
186
+ end = (prompt_index + 1) * int(gen_length / batch_num)
187
+
188
+ mask = torch.bernoulli(torch.full((attn_map_side_length,gen_length), 1 - drop_out, dtype=unet_controller.torch_dtype, device=unet_controller.device))
189
+ mask[:, start:end] = 1
190
+
191
+ mask_list.append(mask)
192
+
193
+ concatenated_mask = torch.stack(mask_list, dim=0)
194
+ return concatenated_mask
195
+
196
+
197
+ def load_pipe_from_path(model_path, device, torch_dtype, variant):
198
+ model_name = model_path.split('/')[-1]
199
+ if model_path.split('/')[-1] == 'playground-v2.5-1024px-aesthetic':
200
+ scheduler = EDMDPMSolverMultistepScheduler.from_pretrained(model_path, subfolder="scheduler", torch_dtype=torch_dtype, variant=variant,)
201
+ else:
202
+ scheduler = EulerDiscreteScheduler.from_pretrained(model_path, subfolder="scheduler", torch_dtype=torch_dtype, variant=variant,)
203
+
204
+ if model_path.split('/')[-1] == 'Juggernaut-X-v10' or model_path.split('/')[-1] == 'Juggernaut-XI-v11':
205
+ variant = None
206
+
207
+ vae = AutoencoderKL.from_pretrained(model_path, subfolder="vae", torch_dtype=torch_dtype, variant=variant,)
208
+ tokenizer = CLIPTokenizer.from_pretrained(model_path, subfolder="tokenizer", torch_dtype=torch_dtype, variant=variant,)
209
+ tokenizer_2 = CLIPTokenizer.from_pretrained(model_path, subfolder="tokenizer_2", torch_dtype=torch_dtype, variant=variant,)
210
+ text_encoder = CLIPTextModel.from_pretrained(model_path, subfolder="text_encoder", torch_dtype=torch_dtype, variant=variant,)
211
+ text_encoder_2 = CLIPTextModelWithProjection.from_pretrained(model_path, subfolder="text_encoder_2", torch_dtype=torch_dtype, variant=variant,)
212
+ unet_new = UNet2DConditionModel.from_pretrained(model_path, subfolder="unet", torch_dtype=torch_dtype, variant=variant,)
213
+
214
+ pipe = pipeline_stable_diffusion_xl.StableDiffusionXLPipeline(
215
+ vae=vae,
216
+ text_encoder=text_encoder,
217
+ text_encoder_2=text_encoder_2,
218
+ tokenizer=tokenizer,
219
+ tokenizer_2=tokenizer_2,
220
+ unet=unet_new,
221
+ scheduler=scheduler,
222
+ )
223
+ pipe.to(device)
224
+
225
+ return pipe, model_name
226
+
227
+
228
+ def get_max_window_length(unet_controller: Optional[UNetController],id_prompt, frame_prompt_list):
229
+ single_long_prompt = id_prompt
230
+ max_window_length = 0
231
+ for index, movement in enumerate(frame_prompt_list):
232
+ single_long_prompt += ' ' + movement
233
+ token_length = len(single_long_prompt.split())
234
+ if token_length >= 77:
235
+ break
236
+ max_window_length += 1
237
+ return max_window_length
238
+
239
+
240
+ def movement_gen_story_slide_windows(id_prompt, frame_prompt_list, pipe, window_length, seed, unet_controller: Optional[UNetController], save_dir, verbose=True):
241
+ import os
242
+ max_window_length = get_max_window_length(unet_controller,id_prompt,frame_prompt_list)
243
+ window_length = min(window_length,max_window_length)
244
+ if window_length < len(frame_prompt_list):
245
+ movement_lists = circular_sliding_windows(frame_prompt_list, window_length)
246
+ else:
247
+ movement_lists = [movement for movement in frame_prompt_list]
248
+ story_images = []
249
+
250
+
251
+ if verbose:
252
+ print("seed:", seed)
253
+ generate = torch.Generator().manual_seed(seed)
254
+ unet_controller.id_prompt = id_prompt
255
+
256
+ for index, movement in enumerate(frame_prompt_list):
257
+ if unet_controller is not None:
258
+ if window_length < len(frame_prompt_list):
259
+ unet_controller.frame_prompt_suppress = movement_lists[index][1:]
260
+ unet_controller.frame_prompt_express = movement_lists[index][0]
261
+ gen_propmts = [f'{id_prompt} {" ".join(movement_lists[index])}']
262
+
263
+ else:
264
+ unet_controller.frame_prompt_suppress = movement_lists[:index] + movement_lists[index+1:]
265
+ unet_controller.frame_prompt_express = movement_lists[index]
266
+ gen_propmts = [f'{id_prompt} {" ".join(movement_lists)}']
267
+
268
+ if verbose:
269
+ print(f"suppress: {unet_controller.frame_prompt_suppress}")
270
+ print(f"express: {unet_controller.frame_prompt_express}")
271
+ print(f'id_prompt: {id_prompt}')
272
+ print(f"gen_propmts: {gen_propmts}")
273
+
274
+
275
+ else:
276
+ gen_propmts = f'{id_prompt} {movement}'
277
+
278
+ if unet_controller is not None and unet_controller.Use_same_init_noise is True:
279
+ generate = torch.Generator().manual_seed(seed)
280
+
281
+ images = pipe(gen_propmts, generator=generate, unet_controller=unet_controller).images
282
+ story_images.append(images[0])
283
+ images[0].save(os.path.join(save_dir, f'{id_prompt} {unet_controller.frame_prompt_express}.jpg'))
284
+
285
+
286
+ image_array_list = [np.array(pil_img) for pil_img in story_images]
287
+
288
+ # Concatenate images horizontally
289
+ story_image = np.concatenate(image_array_list, axis=1)
290
+ story_image = Image.fromarray(story_image.astype(np.uint8))
291
+
292
+ if unet_controller.Save_story_image:
293
+ story_image.save(os.path.join(save_dir, f'story_image_{id_prompt}.jpg'))
294
+
295
+ return story_images, story_image
296
+
297
+ # this function set batch > 1 to generate multiple images at once
298
+ def movement_gen_story_slide_windows_batch(id_prompt, frame_prompt_list, pipe, window_length, seed, unet_controller: Optional[UNetController], save_dir, batch_size=3):
299
+ import os
300
+ max_window_length = get_max_window_length(unet_controller,id_prompt,frame_prompt_list)
301
+ window_length = min(window_length,max_window_length)
302
+ if window_length < len(frame_prompt_list):
303
+ movement_lists = circular_sliding_windows(frame_prompt_list, window_length)
304
+ else:
305
+ movement_lists = [movement for movement in frame_prompt_list]
306
+ story_images = []
307
+
308
+ print("seed:", seed)
309
+ generate = torch.Generator().manual_seed(seed)
310
+ unet_controller.id_prompt = id_prompt
311
+
312
+ gen_prompt_info_list = []
313
+ gen_prompt = None
314
+ for index, _ in enumerate(frame_prompt_list):
315
+ if window_length < len(frame_prompt_list):
316
+ frame_prompt_suppress = movement_lists[index][1:]
317
+ frame_prompt_express = movement_lists[index][0]
318
+ gen_prompt = f'{id_prompt} {" ".join(movement_lists[index])}'
319
+
320
+ else:
321
+ frame_prompt_suppress = movement_lists[:index] + movement_lists[index+1:]
322
+ frame_prompt_express = movement_lists[index]
323
+ gen_prompt = f'{id_prompt} {" ".join(movement_lists)}'
324
+
325
+ gen_prompt_info_list.append({'frame_prompt_suppress': frame_prompt_suppress, 'frame_prompt_express': frame_prompt_express})
326
+
327
+ story_images = []
328
+ for i in range(0, len(gen_prompt_info_list), batch_size):
329
+ batch = gen_prompt_info_list[i:i + batch_size]
330
+ gen_prompts = [gen_prompt for _ in batch]
331
+ unet_controller.frame_prompt_express_list = [gen_prompt_info['frame_prompt_express'] for gen_prompt_info in batch]
332
+ unet_controller.frame_prompt_suppress_list = [gen_prompt_info['frame_prompt_suppress'] for gen_prompt_info in batch]
333
+
334
+ if unet_controller is not None and unet_controller.Use_same_init_noise is True:
335
+ generate = torch.Generator().manual_seed(seed)
336
+
337
+ images = pipe(gen_prompts, generator=generate, unet_controller=unet_controller).images
338
+ for index,image in enumerate(images):
339
+ story_images.append(image)
340
+ image.save(os.path.join(save_dir, f'{id_prompt} {unet_controller.frame_prompt_express_list[index]}.jpg'))
341
+
342
+ image_array_list = [np.array(pil_img) for pil_img in story_images]
343
+
344
+ # Concatenate images horizontally
345
+ story_image = np.concatenate(image_array_list, axis=1)
346
+ story_image = Image.fromarray(story_image.astype(np.uint8))
347
+
348
+ if unet_controller.Save_story_image:
349
+ story_image.save(os.path.join(save_dir, 'story_image.jpg'))
350
+
351
+ return story_images, story_image
352
+
353
+
354
+ def prompt2tokens(tokenizer, prompt):
355
+ text_inputs = tokenizer(
356
+ prompt,
357
+ padding="max_length",
358
+ max_length=tokenizer.model_max_length,
359
+ truncation=True,
360
+ return_tensors="pt",
361
+ )
362
+ text_input_ids = text_inputs.input_ids
363
+ tokens = []
364
+ for text_input_id in text_input_ids[0]:
365
+ token = tokenizer.decoder[text_input_id.item()]
366
+ tokens.append(token)
367
+ return tokens
368
+
369
+
370
+ def punish_wight(tensor, latent_size, alpha=1.0, beta=1.2, calc_similarity=False):
371
+ u, s, vh = torch.linalg.svd(tensor)
372
+ u = u[:,:latent_size]
373
+ zero_idx = int(latent_size * alpha)
374
+
375
+ if calc_similarity:
376
+ _s = s.clone()
377
+ _s *= torch.exp(-alpha*_s) * beta
378
+ _s[zero_idx:] = 0
379
+ _tensor = u @ torch.diag(_s) @ vh
380
+ dist = cdist(tensor[:,0].unsqueeze(0).cpu(), _tensor[:,0].unsqueeze(0).cpu(), metric='cosine')
381
+ print(f'The distance between the word embedding before and after the punishment: {dist}')
382
+ s *= torch.exp(-alpha*s) * beta
383
+ tensor = u @ torch.diag(s) @ vh
384
+ return tensor
385
+
386
+
387
+ def swr_single_prompt_embeds(swr_words,prompt_embeds,prompt,tokenizer,alpha=1.0, beta=1.2, zero_eot=False):
388
+ punish_indices = []
389
+
390
+ prompt_tokens = prompt2tokens(tokenizer,prompt)
391
+
392
+ words_tokens = prompt2tokens(tokenizer,swr_words)
393
+ words_tokens = [word for word in words_tokens if word != '<|endoftext|>' and word != '<|startoftext|>']
394
+ index_of_words = find_sublist_index(prompt_tokens,words_tokens)
395
+
396
+ if index_of_words != -1:
397
+ punish_indices.extend([num for num in range(index_of_words, index_of_words+len(words_tokens))])
398
+
399
+ if zero_eot:
400
+ eot_indices = [index for index, word in enumerate(prompt_tokens) if word == '<|endoftext|>']
401
+ prompt_embeds[eot_indices] *= 9e-1
402
+ pass
403
+ else:
404
+ punish_indices.extend([index for index, word in enumerate(prompt_tokens) if word == '<|endoftext|>'])
405
+
406
+ punish_indices = list(set(punish_indices))
407
+
408
+ wo_batch = prompt_embeds[punish_indices]
409
+ wo_batch = punish_wight(wo_batch.T.to(float), wo_batch.size(0), alpha=alpha, beta=beta, calc_similarity=False).T.to(prompt_embeds.dtype)
410
+
411
+ prompt_embeds[punish_indices] = wo_batch
412
+
413
+
414
+ def find_sublist_index(list1, list2):
415
+ for i in range(len(list1) - len(list2) + 1):
416
+ if list1[i:i + len(list2)] == list2:
417
+ return i
418
+ return -1 # If sublist is not found
419
+
420
+
421
+ def fourier_filter(x_in: "torch.Tensor", threshold: int, scale: int) -> "torch.Tensor":
422
+ """Fourier filter as introduced in FreeU (https://arxiv.org/abs/2309.11497).
423
+
424
+ This version of the method comes from here:
425
+ https://github.com/huggingface/diffusers/pull/5164#issuecomment-1732638706
426
+ """
427
+ x = x_in
428
+ B, C, H, W = x.shape
429
+
430
+ x = x.to(dtype=torch.float32)
431
+
432
+ # FFT
433
+ x_freq = fftn(x, dim=(-2, -1))
434
+ x_freq = fftshift(x_freq, dim=(-2, -1))
435
+
436
+ B, C, H, W = x_freq.shape
437
+ mask = torch.ones((B, C, H, W), device=x.device)
438
+
439
+ crow, ccol = H // 2, W // 2
440
+ mask[..., crow - threshold : crow + threshold, ccol - threshold : ccol + threshold] = scale
441
+ x_freq = x_freq * mask
442
+
443
+ # IFFT
444
+ x_freq = ifftshift(x_freq, dim=(-2, -1))
445
+ x_filtered = ifftn(x_freq, dim=(-2, -1)).real
446
+
447
+ return x_filtered.to(dtype=x_in.dtype)
448
+
449
+
450
+ def apply_freeu(
451
+ resolution_idx: int, hidden_states: "torch.Tensor", res_hidden_states: "torch.Tensor", **freeu_kwargs
452
+ ) -> Tuple["torch.Tensor", "torch.Tensor"]:
453
+ """Applies the FreeU mechanism as introduced in https:
454
+ //arxiv.org/abs/2309.11497. Adapted from the official code repository: https://github.com/ChenyangSi/FreeU.
455
+
456
+ Args:
457
+ resolution_idx (`int`): Integer denoting the UNet block where FreeU is being applied.
458
+ hidden_states (`torch.Tensor`): Inputs to the underlying block.
459
+ res_hidden_states (`torch.Tensor`): Features from the skip block corresponding to the underlying block.
460
+ s1 (`float`): Scaling factor for stage 1 to attenuate the contributions of the skip features.
461
+ s2 (`float`): Scaling factor for stage 2 to attenuate the contributions of the skip features.
462
+ b1 (`float`): Scaling factor for stage 1 to amplify the contributions of backbone features.
463
+ b2 (`float`): Scaling factor for stage 2 to amplify the contributions of backbone features.
464
+ """
465
+ if resolution_idx == 0:
466
+ num_half_channels = hidden_states.shape[1] // 2
467
+ hidden_states[:, :num_half_channels] = hidden_states[:, :num_half_channels] * freeu_kwargs["b1"]
468
+ res_hidden_states = fourier_filter(res_hidden_states, threshold=1, scale=freeu_kwargs["s1"])
469
+ if resolution_idx == 1:
470
+ num_half_channels = hidden_states.shape[1] // 2
471
+ hidden_states[:, :num_half_channels] = hidden_states[:, :num_half_channels] * freeu_kwargs["b2"]
472
+ res_hidden_states = fourier_filter(res_hidden_states, threshold=1, scale=freeu_kwargs["s2"])
473
+
474
+ return hidden_states, res_hidden_states
475
+
476
+
477
+ def circular_sliding_windows(lst, w):
478
+ n = len(lst)
479
+ windows = []
480
+ for i in range(n):
481
+ window = [lst[(i + j) % n] for j in range(w)]
482
+ windows.append(window)
483
+ return windows