Llama.cpp imatrix quantizations of mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1

Using llama.cpp commit 3ad5451 for quantization.

All quants were made using the imatrix option and Bartowski's calibration file.

Perplexity table (the lower the better)

Quant	Size (MB)	PPL	Size (%)	Accuracy (%)	PPL error rate
IQ1_S	489	88.4250	14.40	23.35	1.76
IQ1_M	516	53.8278	15.19	38.35	1.03
IQ2_XXS	560	45.5693	16.49	45.31	0.93
IQ2_XS	598	32.6813	17.61	63.17	0.62
IQ2_S	633	28.5477	18.64	72.32	0.54
IQ2_M	669	31.8272	19.70	64.87	0.63
Q2_K_S	683	28.7707	20.11	71.76	0.54
Q2_K	718	27.6342	21.14	74.71	0.51
IQ3_XXS	733	23.5511	21.58	87.66	0.44
IQ3_XS	793	22.9887	23.35	89.81	0.42
Q3_K_S	821	28.0462	24.17	73.61	0.53
IQ3_S	822	22.9268	24.20	90.05	0.42
IQ3_M	836	22.3167	24.62	92.51	0.41
Q3_K_M	881	22.5727	25.94	91.46	0.41
Q3_K_L	935	22.3758	27.53	92.27	0.41
IQ4_XS	972	21.3273	28.62	96.80	0.38
IQ4_NL	1018	21.3234	29.98	96.82	0.38
Q4_0	1019	22.5210	30.00	91.67	0.41
Q4_K_S	1022	21.1717	30.09	97.51	0.38
Q4_K_M	1065	21.0532	31.36	98.06	0.38
Q4_1	1109	21.1492	32.66	97.62	0.38
Q5_K_S	1201	20.7883	35.37	99.31	0.37
Q5_0	1203	20.8643	35.42	98.95	0.37
Q5_K_M	1226	20.7488	36.10	99.50	0.37
Q5_1	1293	20.7773	38.07	99.37	0.37
Q6_K	1396	20.6994	41.11	99.74	0.37
Q8_0	1807	20.6659	53.21	99.90	0.37
F16	3396	20.6457	100	100	0.37

This is a version of the DeepSeek-R1-Distill-Qwen-1.5B model re-distilled for better performance.

Performance

Models	DeepSeek-R1-Distill-Qwen-1.5B	DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1
ARC (25-shot)	40.96	41.55
HellaSwag (10-shot)	44	45.88
MMLU (5-shot)	39.27	41.82
TruthfulQA-MC2	45.17	46.63
Winogrande (5-shot)	55.49	57.7
GSM8K (5-shot)	69.9	74.3
Average	49.13	51.31

Models	DeepSeek-R1-Distill-Qwen-1.5B	DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1
GPQA (0-shot)	26.96	26.99
MMLU PRO (5-shot)	16.74	19.86
MUSR (0-shot)	35.93	36.6
BBH (3-shot)	35.12	37.23
IfEval (0-shot)	24.94	27.22

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
compute_dtype = torch.bfloat16
device   = 'cuda'
model_id = "mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1"

model     = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=compute_dtype, attn_implementation="sdpa", device_map=device)
tokenizer = AutoTokenizer.from_pretrained(model_id)

prompt  = "What is 1.5+102.2?"
chat    = tokenizer.apply_chat_template([{"role":"user", "content":prompt}], tokenize=True, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(chat.to(device), max_new_tokens=1024, do_sample=True) 
print(tokenizer.decode(outputs[0]))

Output:

<｜begin▁of▁sentence｜><｜User｜>What is 1.5+102.2?<｜Assistant｜><think>
First, I identify the numbers involved in the addition: 1.5 and 102.2.

Next, I add the whole numbers: 1 + 102 equals 103.

Then, I add the decimal parts: 0.5 + 0.2 equals 0.7.

Finally, I combine the results: 103 + 0.7 equals 103.7.
</think>

To solve the addition \(1.5 + 102.2\), follow these steps:

1. **Add the whole numbers:**
   \[
   1 + 102 = 103
   \]

2. **Add the decimal parts:**
   \[
   0.5 + 0.2 = 0.7
   \]

3. **Combine the results:**
   \[
   103 + 0.7 = 103.7
   \]

So, the final answer is \(\boxed{103.7}\).<｜end▁of▁sentence｜>

ThomasBaruzier
/

DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF

Llama.cpp imatrix quantizations of mobiuslabsgmbh/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1

Perplexity table (the lower the better)

Performance

Usage

Model tree for ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF

Collection including ThomasBaruzier/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.1-GGUF

DeepSeek-R1-ReDistill