Running this model using vLLM Docker

by moficodes - opened 2 days ago

2 days ago

The instruction in Use This Model in the corner from vLLM says to run this.

docker run --runtime nvidia --gpus all \
    --name my_vllm_container \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
     --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
    -p 8000:8000 \
    --ipc=host \
    vllm/vllm-openai:latest \
    --model unsloth/DeepSeek-R1-GGUF

How do I choose which quantization to run?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment