Generating Image takes way too long

#45
by MirrorReflection - opened

Hello, I am completely new to this.
I have a RTX 3070, 16 gb of ram and a ryzen 2600 , I believe those are all relevant technical specifications.
I was running the ComfyUi with nvidia run bat file.

I was just trying out things to see how they would work. When I started yesterday I thought it was broken or I installed it wrong since it was at 0% the entire time. I was using the stable diffusion 3.5 large.
Today i tested it again with sd 3.5 large turbo. Which still took a long time, but after a while it moved instantly to 15 %. For the first image I was trying to just display a banana and that just took 50 min roughly.

I noticed that loading the safetensor in the load checkpoint segment took a long time of those 50 minutes overall runtime.
After that first picture was through the other picture were taking defintely less time but still a lot.
I will post images of all relevant information that I could find. Also very weird when I was observing my cpu gpu usage, those two were pretty low compared to my ram usage which was at max when trying to generate a picture.
Either something is really wrong or my hardware is just too weak to power it.

It would be a great help if someone could help me out. One more thing I saw someone else have a very similar issue but I wasnt sure if I should ask there or make my own discussion. I also wanted to mention I am controlling everything through comfyUi so I did not start or have a setup in python. So I dont think I can use the same steps. Also I have no idea how to use it properly anyway.

Issue1.PNG

Issue2.PNG

Issue3.PNG

@MirrorReflection Sd3.5 large turbo or the normal one can’t both fit in 16gb vram, they require 24gb at the least. The only way it’s working is because it’s using the ram which will massively slow down inference.
You can use quantization which considerably lowers vram requirements.
Quantized sd3.5 large: https://huggingface.co./city96/stable-diffusion-3.5-large-gguf

Q8 is the best for you probably, it can fit in 16gb vram and is basically lossless in quality.

There’s q4 which can fit in 8gb vram but will lose a bit of detail.

@MirrorReflection Sd3.5 large turbo or the normal one can’t both fit in 16gb vram, they require 24gb at the least. The only way it’s working is because it’s using the ram which will massively slow down inference.
You can use quantization which considerably lowers vram requirements.
Quantized sd3.5 large: https://huggingface.co./city96/stable-diffusion-3.5-large-gguf

Q8 is the best for you probably, it can fit in 16gb vram and is basically lossless in quality.

There’s q4 which can fit in 8gb vram but will lose a bit of detail.

@YaTharThShaRma999 Thank you for your quick reply. This already helps knowing that my hardware isnt quite enough to take that on. I will try it and I will be back. I just have one more questions where to I put that file ?

@YaTharThShaRma999 Quick question. I have read a little about this quantization version. And I saw that that I would need to install a custom comfyui version for it. Or can I just use it in the normal comfyui ? If so where exactly do I have to put it ? I am just asking since the download takes a bit.

It seems to be working faster now, well at least the beginning. I have been using this as my base: https://huggingface.co./calcuis/sd3.5-large-gguf.
It is the same issue as before and I have also tried the q4 variant, it still maxxing out the ram and not using the gpu. I dont know where I went wrong, I have followed all the steps in the instructions.

Sign up or log in to comment