run vllm failed
#13
by
GarrickLin
- opened
run
vllm serve /models/Pixtral-Large-Instruct-2411 --served-model-name mistralai/Pixtral-Large-Instruct-2411 --config-format mistral --load-format mistral --tokenizer_mode mistral --limit_mm_per_prompt 'image=10' --tensor-parallel-size 8
error
INFO 11-21 19:46:53 model_runner_base.py:120] Writing input of failed execution to /tmp/err_execute_model_input_20241121-194653.pkl...
WARNING 11-21 19:46:53 model_runner_base.py:143] Failed to pickle inputs of failed execution: CUDA error: an illegal memory access was encountered
WARNING 11-21 19:46:53 model_runner_base.py:143] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
WARNING 11-21 19:46:53 model_runner_base.py:143] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
WARNING 11-21 19:46:53 model_runner_base.py:143] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
WARNING 11-21 19:46:53 model_runner_base.py:143]
ERROR 11-21 19:46:53 engine.py:366] Error in model execution: CUDA error: an illegal memory access was encountered
ERROR 11-21 19:46:53 engine.py:366] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
ERROR 11-21 19:46:53 engine.py:366] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
ERROR 11-21 19:46:53 engine.py:366] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
ERROR 11-21 19:46:53 engine.py:366] Traceback (most recent call last):
ERROR 11-21 19:46:53 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner_base.py", line 116, in _wrapper
ERROR 11-21 19:46:53 engine.py:366] return func(*args, **kwargs)
ERROR 11-21 19:46:53 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1654, in execute_model
ERROR 11-21 19:46:53 engine.py:366] hidden_or_intermediate_states = model_executable(
ERROR 11-21 19:46:53 engine.py:366] ^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 11-21 19:46:53 engine.py:366] return self._call_impl(*args, **kwargs)
ERROR 11-21 19:46:53 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 11-21 19:46:53 engine.py:366] return forward_call(*args, **kwargs)
ERROR 11-21 19:46:53 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/pixtral.py", line 211, in forward
ERROR 11-21 19:46:53 engine.py:366] vision_embeddings = self._process_image_input(image_input)
ERROR 11-21 19:46:53 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/pixtral.py", line 261, in _process_image_input
ERROR 11-21 19:46:53 engine.py:366] return self.vision_language_adapter(self.vision_encoder(image_input))
ERROR 11-21 19:46:53 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
ERROR 11-21 19:46:53 engine.py:366] return self._call_impl(*args, **kwargs)
ERROR 11-21 19:46:53 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
ERROR 11-21 19:46:53 engine.py:366] return forward_call(*args, **kwargs)
ERROR 11-21 19:46:53 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/pixtral.py", line 572, in forward
ERROR 11-21 19:46:53 engine.py:366] positions = position_meshgrid(patch_embeds_list).to(self.device)
ERROR 11-21 19:46:53 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366] RuntimeError: CUDA error: an illegal memory access was encountered
ERROR 11-21 19:46:53 engine.py:366] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
ERROR 11-21 19:46:53 engine.py:366] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
ERROR 11-21 19:46:53 engine.py:366] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
ERROR 11-21 19:46:53 engine.py:366]
ERROR 11-21 19:46:53 engine.py:366]
ERROR 11-21 19:46:53 engine.py:366] The above exception was the direct cause of the following exception:
ERROR 11-21 19:46:53 engine.py:366]
ERROR 11-21 19:46:53 engine.py:366] Traceback (most recent call last):
ERROR 11-21 19:46:53 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine
ERROR 11-21 19:46:53 engine.py:366] engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
ERROR 11-21 19:46:53 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 119, in from_engine_args
ERROR 11-21 19:46:53 engine.py:366] return cls(ipc_path=ipc_path,
ERROR 11-21 19:46:53 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 71, in __init__
ERROR 11-21 19:46:53 engine.py:366] self.engine = LLMEngine(*args, **kwargs)
ERROR 11-21 19:46:53 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 350, in __init__
ERROR 11-21 19:46:53 engine.py:366] self._initialize_kv_caches()
ERROR 11-21 19:46:53 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 487, in _initialize_kv_caches
ERROR 11-21 19:46:53 engine.py:366] self.model_executor.determine_num_available_blocks())
ERROR 11-21 19:46:53 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/distributed_gpu_executor.py", line 39, in determine_num_available_blocks
ERROR 11-21 19:46:53 engine.py:366] num_blocks = self._run_workers("determine_num_available_blocks", )
ERROR 11-21 19:46:53 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/multiproc_gpu_executor.py", line 195, in _run_workers
ERROR 11-21 19:46:53 engine.py:366] driver_worker_output = driver_worker_method(*args, **kwargs)
ERROR 11-21 19:46:53 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 11-21 19:46:53 engine.py:366] return func(*args, **kwargs)
ERROR 11-21 19:46:53 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 195, in determine_num_available_blocks
ERROR 11-21 19:46:53 engine.py:366] self.model_runner.profile_run()
ERROR 11-21 19:46:53 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 11-21 19:46:53 engine.py:366] return func(*args, **kwargs)
ERROR 11-21 19:46:53 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner.py", line 1316, in profile_run
ERROR 11-21 19:46:53 engine.py:366] self.execute_model(model_input, kv_caches, intermediate_tensors)
ERROR 11-21 19:46:53 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 11-21 19:46:53 engine.py:366] return func(*args, **kwargs)
ERROR 11-21 19:46:53 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^
ERROR 11-21 19:46:53 engine.py:366] File "/usr/local/lib/python3.12/dist-packages/vllm/worker/model_runner_base.py", line 146, in _wrapper
ERROR 11-21 19:46:53 engine.py:366] raise type(err)(f"Error in model execution: "
ERROR 11-21 19:46:53 engine.py:366] RuntimeError: Error in model execution: CUDA error: an illegal memory access was encountered
ERROR 11-21 19:46:53 engine.py:366] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
ERROR 11-21 19:46:53 engine.py:366] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
ERROR 11-21 19:46:53 engine.py:366] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
vllm Version: 0.6.4.post1+cu124