Spaces:
Running
Multiple GPU supports
I'm pretty sure your question is answered here; "This is achieved by making Spaces efficiently hold and release GPUs as needed (as opposed to a classical GPU Space that holds exactly one GPU at any point in time)"
Which is showcased in this gif
However I'm not quite sure you can create a mixtral demo with ZeroGPU, I could be wrong though. You may give it a try.
Thank you for your answer!
To execute Mixtral without quantization and offloading, 3 or 4 A100s are required, but it seems impossible with the current Zero GPU.
I think your explanation is correct.
Again, thank you!
You are welcome!
Though you could try to use Mixtral GGUF from TheBloke's Mixtral-8x7B-instruct-v0.1 Quants Or TheBloke's Dolphin-Mixtral-8x7B Quants, I'm sure you'll be able to find a space with ZeroGPU that runs GGUF Quants and play around with it.