Unusable on ZeroGPU

#1
by adamelliotfields - opened

The app times out waiting for the model to load. My ZeroGPU balance depletes before I can generate an image.

A big challenge has been trying to understand what happens during the ZeroGPU lifecycle. When a model is loaded from disk straight to GPU, where does it go once the GPU is discarded? Do these ephemeral A100s stick around for a bit to handle more requests? Is SDXL in a Gradio queue deployment with a concurrency limit of 2 too much for an A100?

First, investigate a better error message when timeouts occur. Users should not be directed to retry immediately when the system is experiencing degraded performance. Try comparing the requested duration to the timer.

If a reasonable solution can't be found, then the app will have to run in a "single-model" mode on ZeroGPU without the refiner. Check for the Spaces environment variable and operate normally on regular hardware. Use one of the Segmind models as default.

adamelliotfields changed discussion title from Performance improvements to Unusable on ZeroGPU
  • Added the base vae_1_0/config.json to preloaded files.
  • Made the refiner disabled by default.
  • Refined GPU duration request. Now accounts for number of pixels. Still overestimates to account for variance.
  • Added Segmind's new Vega model and made it the default. Can load and generate in under 10 seconds now!

I confirmed that the vae_1_0 config is downloaded if not in the cache. The 0.9 version is what is in the actual vae folder. The configs are downloaded but not the safetensors. I do have the 0.9 checkpoint in preloaded files just in case, but it should never be loaded since we're always using Ollin's VAE. If there were network issues, it's possible that the pipeline was timing out waiting for that file.

Closed by 6b66635 and fe94951.

adamelliotfields changed discussion status to closed

Sign up or log in to comment