Running this model using vLLM Docker

#8
by moficodes - opened

The instruction in Use This Model in the corner from vLLM says to run this.

docker run --runtime nvidia --gpus all \
    --name my_vllm_container \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
     --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
    -p 8000:8000 \
    --ipc=host \
    vllm/vllm-openai:latest \
    --model unsloth/DeepSeek-R1-GGUF

How do I choose which quantization to run?

Sign up or log in to comment