Llama.cpp imatrix quantizations of mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1

Using llama.cpp commit 3ad5451 for quantization.

All quants were made using the imatrix option and Bartowski's calibration file.


Perplexity table (the lower the better)

Quant Size (MB) PPL Size (%) Accuracy (%) PPL error rate
IQ1_S 489 88.4250 14.40 23.35 1.76
IQ1_M 516 53.8278 15.19 38.35 1.03
IQ2_XXS 560 45.5693 16.49 45.31 0.93
IQ2_XS 598 32.6813 17.61 63.17 0.62
IQ2_S 633 28.5477 18.64 72.32 0.54
IQ2_M 669 31.8272 19.70 64.87 0.63
Q2_K_S 683 28.7707 20.11 71.76 0.54
Q2_K 718 27.6342 21.14 74.71 0.51
IQ3_XXS 733 23.5511 21.58 87.66 0.44
IQ3_XS 793 22.9887 23.35 89.81 0.42
Q3_K_S 821 28.0462 24.17 73.61 0.53
IQ3_S 822 22.9268 24.20 90.05 0.42
IQ3_M 836 22.3167 24.62 92.51 0.41
Q3_K_M 881 22.5727 25.94 91.46 0.41
Q3_K_L 935 22.3758 27.53 92.27 0.41
IQ4_XS 972 21.3273 28.62 96.80 0.38
IQ4_NL 1018 21.3234 29.98 96.82 0.38
Q4_0 1019 22.5210 30.00 91.67 0.41
Q4_K_S 1022 21.1717 30.09 97.51 0.38
Q4_K_M 1065 21.0532 31.36 98.06 0.38
Q4_1 1109 21.1492 32.66 97.62 0.38
Q5_K_S 1201 20.7883 35.37 99.31 0.37
Q5_0 1203 20.8643 35.42 98.95 0.37
Q5_K_M 1226 20.7488 36.10 99.50 0.37
Q5_1 1293 20.7773 38.07 99.37 0.37
Q6_K 1396 20.6994 41.11 99.74 0.37
Q8_0 1807 20.6659 53.21 99.90 0.37
F16 3396 20.6457 100 100 0.37

This is a version of the DeepSeek-R1-Distill-Qwen-1.5B model re-distilled for better performance.

Performance

Models DeepSeek-R1-Distill-Qwen-1.5B DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1
ARC (25-shot) 40.96 41.55
HellaSwag (10-shot) 44 45.88
MMLU (5-shot) 39.27 41.82
TruthfulQA-MC2 45.17 46.63
Winogrande (5-shot) 55.49 57.7
GSM8K (5-shot) 69.9 74.3
Average 49.13 51.31
Models DeepSeek-R1-Distill-Qwen-1.5B DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1
GPQA (0-shot) 26.96 26.99
MMLU PRO (5-shot) 16.74 19.86
MUSR (0-shot) 35.93 36.6
BBH (3-shot) 35.12 37.23
IfEval (0-shot) 24.94 27.22

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
compute_dtype = torch.bfloat16
device   = 'cuda'
model_id = "mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1"

model     = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=compute_dtype, attn_implementation="sdpa", device_map=device)
tokenizer = AutoTokenizer.from_pretrained(model_id)

prompt  = "What is 1.5+102.2?"
chat    = tokenizer.apply_chat_template([{"role":"user", "content":prompt}], tokenize=True, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(chat.to(device), max_new_tokens=1024, do_sample=True) 
print(tokenizer.decode(outputs[0]))

Output:

<|begin▁of▁sentence|><|User|>What is 1.5+102.2?<|Assistant|><think>
First, I identify the numbers involved in the addition: 1.5 and 102.2.

Next, I add the whole numbers: 1 + 102 equals 103.

Then, I add the decimal parts: 0.5 + 0.2 equals 0.7.

Finally, I combine the results: 103 + 0.7 equals 103.7.
</think>

To solve the addition \(1.5 + 102.2\), follow these steps:

1. **Add the whole numbers:**
   \[
   1 + 102 = 103
   \]

2. **Add the decimal parts:**
   \[
   0.5 + 0.2 = 0.7
   \]

3. **Combine the results:**
   \[
   103 + 0.7 = 103.7
   \]

So, the final answer is \(\boxed{103.7}\).<|end▁of▁sentence|>
Downloads last month
898
GGUF
Model size
1.78B params
Architecture
qwen2

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF

Quantized
(162)
this model

Collection including ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF