Qwen/Qwen2-VL-7B-Instruct · RuntimeError: CUDA error: operation not permitted when stream is capturing

Exception in callback _raise_exception_on_finish(<Future finis...sertions.\n')>) at /usr/local/lib/python3.10/dist-packages/lmdeploy/vl/engine.py:20
handle: <Handle _raise_exception_on_finish(<Future finis...sertions.\n')>) at /usr/local/lib/python3.10/dist-packages/lmdeploy/vl/engine.py:20>
Traceback (most recent call last):
File "/usr/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/usr/local/lib/python3.10/dist-packages/lmdeploy/vl/engine.py", line 27, in _raise_exception_on_finish
raise e
File "/usr/local/lib/python3.10/dist-packages/lmdeploy/vl/engine.py", line 23, in _raise_exception_on_finish
task.result()
File "/usr/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.10/dist-packages/lmdeploy/vl/engine.py", line 169, in forward
outputs = self.model.forward(*func_inputs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/lmdeploy/vl/model/qwen2.py", line 97, in forward
pixel_values = image_inputs['pixel_values'].to(
RuntimeError: CUDA error: operation not permitted when stream is capturing
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.