run vllm failed

#13
by GarrickLin - opened

run

vllm serve /models/Pixtral-Large-Instruct-2411 --served-model-name mistralai/Pixtral-Large-Instruct-2411 --config-format mistral --load-format mistral --tokenizer_mode mistral --limit_mm_per_prompt 'image=10' --tensor-parallel-size 8

error

INFO 11-21 19:46:53 model_runner_base.py:120] Writing input of failed execution to /tmp/err_execute_model_input_20241121-194653.pkl...
WARNING 11-21 19:46:53 model_runner_base.py:143] Failed to pickle inputs of failed execution: CUDA error: an illegal memory access was encountered
WARNING 11-21 19:46:53 model_runner_base.py:143] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
WARNING 11-21 19:46:53 model_runner_base.py:143] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
WARNING 11-21 19:46:53 model_runner_base.py:143] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
WARNING 11-21 19:46:53 model_runner_base.py:143] 
ERROR 11-21 19:46:53 engine.py:366] Error in model execution: CUDA error: an illegal memory access was encountered
ERROR 11-21 19:46:53 engine.py:366] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
ERROR 11-21 19:46:53 engine.py:366] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
ERROR 11-21 19:46:53 engine.py:366] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
ERROR 11-21 19:46:53 engine.py:366] Traceback (most recent call last):
ERROR 11-21 19:46:53 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner_base.py", line 116, in _wrapper
ERROR 11-21 19:46:53 engine.py:366]     return func(*args, **kwargs)
ERROR 11-21 19:46:53 engine.py:366]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1654, in execute_model
ERROR 11-21 19:46:53 engine.py:366]     hidden_or_intermediate_states = model_executable(
ERROR 11-21 19:46:53 engine.py:366]                                     ^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 11-21 19:46:53 engine.py:366]     return self._call_impl(*args, **kwargs)
ERROR 11-21 19:46:53 engine.py:366]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 11-21 19:46:53 engine.py:366]     return forward_call(*args, **kwargs)
ERROR 11-21 19:46:53 engine.py:366]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/pixtral.py", line 211, in forward
ERROR 11-21 19:46:53 engine.py:366]     vision_embeddings = self._process_image_input(image_input)
ERROR 11-21 19:46:53 engine.py:366]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/pixtral.py", line 261, in _process_image_input
ERROR 11-21 19:46:53 engine.py:366]     return self.vision_language_adapter(self.vision_encoder(image_input))
ERROR 11-21 19:46:53 engine.py:366]                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 11-21 19:46:53 engine.py:366]     return self._call_impl(*args, **kwargs)
ERROR 11-21 19:46:53 engine.py:366]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 11-21 19:46:53 engine.py:366]     return forward_call(*args, **kwargs)
ERROR 11-21 19:46:53 engine.py:366]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/pixtral.py", line 572, in forward
ERROR 11-21 19:46:53 engine.py:366]     positions = position_meshgrid(patch_embeds_list).to(self.device)
ERROR 11-21 19:46:53 engine.py:366]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366] RuntimeError: CUDA error: an illegal memory access was encountered
ERROR 11-21 19:46:53 engine.py:366] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
ERROR 11-21 19:46:53 engine.py:366] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
ERROR 11-21 19:46:53 engine.py:366] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
ERROR 11-21 19:46:53 engine.py:366] 
ERROR 11-21 19:46:53 engine.py:366] 
ERROR 11-21 19:46:53 engine.py:366] The above exception was the direct cause of the following exception:
ERROR 11-21 19:46:53 engine.py:366] 
ERROR 11-21 19:46:53 engine.py:366] Traceback (most recent call last):
ERROR 11-21 19:46:53 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine
ERROR 11-21 19:46:53 engine.py:366]     engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
ERROR 11-21 19:46:53 engine.py:366]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 119, in from_engine_args
ERROR 11-21 19:46:53 engine.py:366]     return cls(ipc_path=ipc_path,
ERROR 11-21 19:46:53 engine.py:366]            ^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 71, in __init__
ERROR 11-21 19:46:53 engine.py:366]     self.engine = LLMEngine(*args, **kwargs)
ERROR 11-21 19:46:53 engine.py:366]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 350, in __init__
ERROR 11-21 19:46:53 engine.py:366]     self._initialize_kv_caches()
ERROR 11-21 19:46:53 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 487, in _initialize_kv_caches
ERROR 11-21 19:46:53 engine.py:366]     self.model_executor.determine_num_available_blocks())
ERROR 11-21 19:46:53 engine.py:366]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/distributed_gpu_executor.py", line 39, in determine_num_available_blocks
ERROR 11-21 19:46:53 engine.py:366]     num_blocks = self._run_workers("determine_num_available_blocks", )
ERROR 11-21 19:46:53 engine.py:366]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/multiproc_gpu_executor.py", line 195, in _run_workers
ERROR 11-21 19:46:53 engine.py:366]     driver_worker_output = driver_worker_method(*args, **kwargs)
ERROR 11-21 19:46:53 engine.py:366]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 11-21 19:46:53 engine.py:366]     return func(*args, **kwargs)
ERROR 11-21 19:46:53 engine.py:366]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 195, in determine_num_available_blocks
ERROR 11-21 19:46:53 engine.py:366]     self.model_runner.profile_run()
ERROR 11-21 19:46:53 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 11-21 19:46:53 engine.py:366]     return func(*args, **kwargs)
ERROR 11-21 19:46:53 engine.py:366]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1316, in profile_run
ERROR 11-21 19:46:53 engine.py:366]     self.execute_model(model_input, kv_caches, intermediate_tensors)
ERROR 11-21 19:46:53 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 11-21 19:46:53 engine.py:366]     return func(*args, **kwargs)
ERROR 11-21 19:46:53 engine.py:366]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner_base.py", line 146, in _wrapper
ERROR 11-21 19:46:53 engine.py:366]     raise type(err)(f"Error in model execution: "
ERROR 11-21 19:46:53 engine.py:366] RuntimeError: Error in model execution: CUDA error: an illegal memory access was encountered
ERROR 11-21 19:46:53 engine.py:366] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
ERROR 11-21 19:46:53 engine.py:366] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
ERROR 11-21 19:46:53 engine.py:366] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

vllm Version: 0.6.4.post1+cu124

Sign up or log in to comment