This is a quantization of the DeepSeek-R1-Distill-Qwen-32B.

DeepSeek's Qwen-distilled models are compact reasoning models derived from DeepSeek-R1, achieving exceptional performance by distilling larger model reasoning patterns into smaller architectures. Spanning from 1.5B to 70B parameters, the models are based on Qwen2.5 and Llama3, with the standout DeepSeek-R1-Distill-Qwen-32B outperforming OpenAI-o1-mini and setting new dense model benchmarks. By combining reinforcement learning (RL) and supervised fine-tuning (SFT), these open-source models provide a powerful resource for advancing research and practical applications.

Evaluations

This model provides an accuracy recovery of 100.04%.

English DeepSeek-R1-Distill-Qwen-32B DeepSeek-R1-Distill-Qwen-32B-FP8-Dynamic (this)
Avg. 74.03 74.06
ARC 68.2 68.9
Hellaswag 74 73.7
MMLU 79.88 79.57

We did not check for data contamination. Evaluation was done using Eval. Harness with limit=1000.

Usage

Install vLLM and run the server:

python -m vllm.entrypoints.openai.api_server --model cortecs/DeepSeek-R1-Distill-Qwen-32B-FP8-Dynamic

Access the model:

curl http://localhost:8000/v1/completions     -H "Content-Type: application/json"     -d ' {
        "model": "cortecs/DeepSeek-R1-Distill-Qwen-32B-FP8-Dynamic",
        "prompt": "San Francisco is a"
    } '
Downloads last month
58
Safetensors
Model size
32.8B params
Tensor type
BF16
·
F8_E4M3
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for cortecs/DeepSeek-R1-Distill-Qwen-32B-FP8-Dynamic

Quantized
(68)
this model