OPEA/QwQ-32B-Preview-int4-sym-mixed-inc

Model Details

This model is an int4 model with group_size 128 and symmetric quantization of Qwen/QwQ-32B-Preview generated by intel/auto-round. We excluded 3 layers from quantization due to the overflow issue on some int4 backends. You could find AutoAWQ format here,which is a little different from this one.

How To Use

INT4 Inference(CPU/HPU/CUDA)

from auto_round import AutoRoundConfig ##must import for auto-round format
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "OPEA/QwQ-32B-Preview-int4-sym-mixed-inc"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "How many r in strawberry."
messages = [
    {"role": "system", "content": "You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512,
    do_sample=False ##change this to follow official usage
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)


prompt = "9.11和9.8哪个数字大"  
#INT4: 
"""9.11和9.8，哪个数字大呢？我得好好想想。首先，这两个数字都是小数，也就是带小数点的数。9.11看起来像是9又11分之一，而9.8是9又8/10。不过，我得确认一下，因为在不同的上下文中，小数的表示可能有所不同。

首先，我需要明确这两个数字的表示方式。在常见的十进制系统中，小数点左边的部分是整数部分，右边是小数部分。所以，9.11应该是9加上0.11，而9.8是9加上0.8。

如果这是标准的十进制小数，那么比较它们的大小就比较简单。显然，0.8大于0.11，所以9.8应该大于9.11。但是，我得再确认一下，因为有时候小数的表示可能有不同含义，比如在某些上下文中，小数点后面的部分可能代表不同的东西。

另外，我注意到9.11可能被误解为9月11日，也就是一个日期，而9.8可能被理解为9.8，一个单纯的小数。但如果按照日期来理解，9.11是9月11日，而9.8如果是9月8日，那么显然9月11日晚于9月8日。但是，题目中给出的是9.11和9.8，没有明确指出是日期还是小数，所以我假设它们都是小数。

为了确保，我再检查一下。在数学中，小数点表示法是国际通用的，小数点后面的部分表示分数部分。所以，9.11=9+0.11，而9.8=9+0.8。现在，比较0.11和0.8，显然0.8大于0.11，因此9.8大于9.11。

但是，也许从另一个角度考虑。有时候，小数可能表示百分比。比如，9.11可能表示9.11%，而9.8表示9.8%。如果是这样，那么9.8%大于9.11%。或者，如果它们表示的是版本号，比如软件的版本，那么比较方法可能不同。例如，在有些版本号系统中，9.11可能大于9.8，因为11大于8。"""

##BF16
"""9.11和9.8，哪个数字大呢？我得想一想。首先，这两个数字都是小数，而且都以9开头。9.11是九点一一，9.8是九点八。我觉得9.8可能更大，因为八比一一要大。但是，我有点不确定，因为11是两位数，可能表示更大。

让我来仔细比较一下。在小数比较中，先看整数部分，它们的整数部分都是9，所以一样。那就要看小数部分，首先是十分位。9.11的十分位是1，9.8的十分位是8。8比1大，所以9.8应该更大。

不过，我再想想，也许有人会认为11比8大，因为11是两位数。但其实，在小数比较中，位数不是决定因素，而是每位上的数字大小。所以，尽管9.11的小数部分是两位，但它的十分位是1，而9.8的十分位是8，所以9.8更大。

为了更确定，我可以把它们转换成分数或者 decimal 形式来比较。比如说，9.11等于9又11/100，而9.8等于9又80/100。很明显，80/100大于11/100，所以9.8更大。

或者，我可以把它们都转换成百分数。9.11是911%，9.8是980%。980%大于911%，所以9.8更大。

另外，如果我想象一下在数轴上，9.11是在9和10之间的某个位置，而9.8是在更靠近10的地方。所以，9.8当然大于9.11。

再者，我可以减去9，看看小数部分。9.11减去9是0.11，9.8减去9是0.8。0.8大于0.11，所以9.8更大。

我还可以考虑它们的差值。9.8减去9.11等于0.69，这是一个正数，说明9.8大于9.11。

或者，我可以把它们都乘以100，变成整数。9.11乘以100是911，"""

prompt = "How many r in strawberry."
##INT4: 
"""Let's see. I have this question: "How many r's in strawberry?" Okay, first things first, I need to figure out what exactly is being asked here. It seems straightforward—counting the number of times the letter 'r' appears in the word "strawberry." But let's make sure.

So, the word is "strawberry." I should probably write it down and look at each letter to count the 'r's. Let's spell it out: s-t-r-a-w-b-e-r-r-y. Okay, that's how it's spelled.

Now, to count the 'r's, I need to go through each letter and see how many times 'r' appears. Let's do that:

First letter: s, not r.

Second letter: t, not r.

Third letter: r, okay, that's one r.

Fourth letter: a, not r.

Fifth letter: w, not r.

Sixth letter: b, not r.

Seventh letter: e, not r.

Eighth letter: r, that's another r.

Ninth letter: r, that's yet another r.

Tenth letter: r, wait, no, that's y, not r.

Wait, let me check again. s-t-r-a-w-b-e-r-r-y. So, positions 3,8,9 are r's, and the last letter is y, not r. So, there are three r's in strawberry.

But, just to be thorough, let's make sure. s-t-r-a-w-b-e-r-r-y. Letters 3,8,9 are r's, and the rest are not. So, definitely three r's.

But, the question is "How many r's in strawberry?" and I think it's a simple letter counting, but maybe there's more to it. Perhaps it's a trick question or something.

Wait, in some contexts, 'r' can stand for something else, like in measurements or other abbreviations, but in this case, it's clearly about the letter in the word.

Alternatively, maybe it's about the phonetic sounds, like how many times the 'r' sound appears in the pronunciation of strawberry. But the question specifies 'r's in strawberry,' which probably means the letter count.

In British English, strawberry is spelled the same way, but in some words, there are differences, like color vs. colour, but in this case, it's the same.

Also, worth"""

##BF16:
"""Let's see. The word is "strawberry." I need to find out how many times the letter "r" appears in it. Okay, so I'll look at each letter in the word one by one.

First letter: s - that's not r.

Second letter: t - no, not r.

Third letter: r - okay, that's one r.

Fourth letter: a - not r.

Fifth letter: w - not r.

Sixth letter: b - no.

Seventh letter: e - not r.

Eighth letter: r - another r, so that's two rs.

Ninth letter: r - wait, is there a ninth letter? Let me check. S-t-r-a-w-b-e-r-r-y. Yes, there are two rs, but I think there might be more.

Wait, let's count again. S-t-r-a-w-b-e-r-r-y. That's 10 letters. So, positions:

1: s

2: t

3: r

4: a

5: w

6: b

7: e

8: r

9: r

10: y

So, positions 3, 8, and 9 are rs. That means there are three rs in "strawberry."

But earlier I thought there were only two. Maybe I missed one. Let's double-check.

S-t-r-a-w-b-e-r-r-y.

r is the third letter, then the eighth, and the ninth. So, three rs.

Wait, but sometimes people might pronounce it differently, but in the spelling, it's three rs.

I think the answer is three.

**Final Answer**

\[ \boxed{3} \]
"""

Evaluate the model

pip3 install lm-eval==0.4.5

auto-round --model "OPEA/QwQ-32B-Preview-int4-sym-mixed-inc" --eval --eval_bs 16  --tasks leaderboard_ifeval,leaderboard_mmlu_pro,gsm8k,lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,cmmlu,ceval-valid

Metric	BF16	INT4	iter1000 nsamples 512
Avg	0.6846	0.6857	0.6826
leaderboard_mmlu_pro 5 shots	0.5773	0.5736	0.5733
leaderboard_ifeval inst_level_strict_acc	0.4043=(0.4628+0.3457)/2	0.3919=(0.4436+0.3401)/2	0.4028=(0.4544+0.3512)/2
gsm8k 5 shots	0.8271	0.8294	0.8423
cmmlu	0.8795	0.8730	0.8736
ceval-valid	0.8730	0.8685	0.8633
lambada_openai	0.7565	0.7625	0.7609
hellaswag	0.6646	0.6608	0.6596
winogrande	0.7443	0.7577	0.7498
piqa	0.8128	0.8172	0.8112
truthfulqa_mc1	0.4162	0.4211	0.4100
openbookqa	0.3440	0.3560	0.3360
boolq	0.9003	0.8988	0.8972
arc_easy	0.8279	0.8300	0.8224
arc_challenge	0.5572	0.5597	0.5538

Generate the model

Here is the sample command to generate the model. For symmetric quantization, we found overflow/NAN will occur for some backends, so better fallback some layers. auto_round requires version >=0.4.1

auto-round \
--model  Qwen/QwQ-32B-Preview \
--device 0 \
--group_size 128 \
--bits 4 \
--disable_eval \
--model_dtype "fp16" \
--fp_layers "model.layers.5.mlp.down_proj,model.layers.5.mlp.up_proj,model.layers.5.mlp.gate_proj" \
--format 'auto_round' \
--output_dir "./tmp_autoround"

Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

Intel Neural Compressor link

Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

arxiv github

OPEA
/

QwQ-32B-Preview-int4-sym-mixed-inc

Model Details

How To Use

INT4 Inference(CPU/HPU/CUDA)

Evaluate the model

Generate the model

Ethical Considerations and Limitations

Caveats and Recommendations

Disclaimer

Cite

Model tree for OPEA/QwQ-32B-Preview-int4-sym-mixed-inc

Dataset used to train OPEA/QwQ-32B-Preview-int4-sym-mixed-inc

Collection including OPEA/QwQ-32B-Preview-int4-sym-mixed-inc

QWEN-AutoRound