Running this model using vLLM Docker
#8
by
moficodes
- opened
The instruction in Use This Model
in the corner from vLLM says to run this.
docker run --runtime nvidia --gpus all \
--name my_vllm_container \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HUGGING_FACE_HUB_TOKEN=<secret>" \
-p 8000:8000 \
--ipc=host \
vllm/vllm-openai:latest \
--model unsloth/DeepSeek-R1-GGUF
How do I choose which quantization to run?