This is a quantization of the DeepSeek-R1-Distill-Qwen-32B.
DeepSeek's Qwen-distilled models are compact reasoning models derived from DeepSeek-R1, achieving exceptional performance by distilling larger model reasoning patterns into smaller architectures. Spanning from 1.5B to 70B parameters, the models are based on Qwen2.5 and Llama3, with the standout DeepSeek-R1-Distill-Qwen-32B outperforming OpenAI-o1-mini and setting new dense model benchmarks. By combining reinforcement learning (RL) and supervised fine-tuning (SFT), these open-source models provide a powerful resource for advancing research and practical applications.
Evaluations
This model provides an accuracy recovery of 100.04%.
English | DeepSeek-R1-Distill-Qwen-32B | DeepSeek-R1-Distill-Qwen-32B-FP8-Dynamic (this) |
---|---|---|
Avg. | 74.03 | 74.06 |
ARC | 68.2 | 68.9 |
Hellaswag | 74 | 73.7 |
MMLU | 79.88 | 79.57 |
We did not check for data contamination.
Evaluation was done using Eval. Harness with limit=1000
.
Usage
Install vLLM and run the server:
python -m vllm.entrypoints.openai.api_server --model cortecs/DeepSeek-R1-Distill-Qwen-32B-FP8-Dynamic
Access the model:
curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d ' {
"model": "cortecs/DeepSeek-R1-Distill-Qwen-32B-FP8-Dynamic",
"prompt": "San Francisco is a"
} '
- Downloads last month
- 58
Model tree for cortecs/DeepSeek-R1-Distill-Qwen-32B-FP8-Dynamic
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B