Spaces:
Running
Running
RuntimeError: Unexpected error from cudaGetDeviceCount().
#142
by
yslan
- opened
My ZeroGPU space here has been running smoothly for a while and until recently, it yields the following error:
INFO:httpx:HTTP Request: POST http://device-api.zero/schedule?cgroupPath=%2Fkubepods.slice%2Fkubepods-burstable.slice%2Fkubepods-burstable-pod41a37386_fb79_48be_813e_3491fe073ed7.slice%2Fcri-containerd-8875168c6e81907b4da8159a3083690d62e67a68e191030629d2e5930b4cf386.scope&taskId=140155930786352&enableQueue=true&durationSeconds=50&token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpcCI6Ijg5LjIxMy4xNzkuMTM0IiwidXNlciI6Im1vbW95YTIwMDAiLCJ1dWlkIjpudWxsLCJleHAiOjE3MzQ3NjQ1MTl9.cp37zxJwWUNgMLbVZkKD4IXgL-r43kGlO7aUxiHIIZc "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://device-api.zero/allow?allowToken=443a4c9ec107e9806f9a39274fe5b759fdd2cf9488f8d59c5937eb6e6304e835&pid=1247 "HTTP/1.1 200 OK"
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 135, in worker_init
torch.init(nvidia_uuid)
File "/usr/local/lib/python3.10/site-packages/spaces/zero/torch/patching.py", line 350, in init
torch.Tensor([0]).cuda()
File "/usr/local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 298, in _lazy_init
torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 304: OS call failed or operation not supported on this OS
No idea how to deal with this since I have not made any modification to the code. Any suggestions?
A similar question is posted here: https://huggingface.co./spaces/zero-gpu-explorers/README/discussions/138