OPEA/Marco-o1-int4-sym-awq-inc

Model Details

This model is an int4 model with group_size 128 and symmetric quantization of AIDC-AI/Marco-o1 generated by intel/auto-round.

INT4 Inference(CPU/HPU/CUDA)

import torch
from typing import List, Dict, Tuple
from transformers import AutoModelForCausalLM, AutoTokenizer



def load_model_and_tokenizer(path):
    tokenizer = AutoTokenizer.from_pretrained(path, 
                                              trust_remote_code=True
                                              )
    model = AutoModelForCausalLM.from_pretrained(path, 
                                                 device_map="auto", ##change device map
                                                 trust_remote_code=True,
                                                
                                                )
    model.eval()
    return tokenizer, model


def generate_response(model, tokenizer,
                      input_ids, attention_mask,
                      max_new_tokens=4096):
    generated_ids = input_ids
    with torch.inference_mode():
        for _ in range(max_new_tokens):
            outputs = model(input_ids=generated_ids, attention_mask=attention_mask)
            next_token_id = torch.argmax(outputs.logits[:, -1, :], dim=-1).unsqueeze(-1)
            generated_ids = torch.cat([generated_ids, next_token_id], dim=-1)
            attention_mask = torch.cat([attention_mask, torch.ones_like(next_token_id)], dim=-1)
            new_token = tokenizer.decode(next_token_id.squeeze(), skip_special_tokens=True)
            print(new_token, end='', flush=True)
            if next_token_id.item() == tokenizer.eos_token_id:
                break
    return tokenizer.decode(generated_ids[0][input_ids.shape[-1]:], skip_special_tokens=True)


def chat(model, tokenizer):
    history: List[Dict[str, str]] = []
    print("Enter 'q' to quit, 'c' to clear chat history.")
    while True:
        user_input = input("User: ").strip().lower()
        if user_input == 'q':
            print("Exiting chat.")
            break
        if user_input == 'c':
            print("Clearing chat history.")
            history.clear()
            continue
        if not user_input:
            print("Input cannot be empty.")
            continue

        history.append({"role": "user", "content": user_input})
        text = tokenizer.apply_chat_template(history, tokenize=False, add_generation_prompt=True)
        model_inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=4096).to('cuda:0')

        print('Assistant:', end=' ', flush=True)
        response = generate_response(model, tokenizer, model_inputs.input_ids, model_inputs.attention_mask)
        print()
        history.append({"role": "assistant", "content": response})


def main():
    path = "OPEA/Marco-o1-int4-sym-awq-inc"
    tokenizer, model = load_model_and_tokenizer(path)
    print('Starting chat.')
    chat(model, tokenizer)
main()
    
"""9.11和9.8哪个数字大"""
## INT4 
""" <Thought>
Alright, I need to determine which number is larger between 9.11 and 9.8. Let's break this down step by step to ensure accuracy.

First, I'll consider the structure of these numbers. Both numbers are decimal numbers, which means they have a whole number part and a fractional part separated by a decimal point. Specifically, 9.11 has a whole number part of 9 and a fractional part of 0.11. Similarly, 9.8 has a whole number part of 9 and a fractional part of 0.8.

To compare these two numbers, I'll start by comparing their whole number parts. Both numbers have the same whole number part, which is 9. Since the whole number parts are equal, the comparison will focus on the fractional parts.

Next, I'll look at the fractional parts. 9.11 has a fractional part of 0.11, while 9.8 has a fractional part of 0.8. To compare these, I'll convert them to fractions to make the comparison easier.

0.11 is equivalent to 11/100, and 0.8 is equivalent to 8/10. To compare these fractions, I'll find a common denominator. The least common denominator for 100 and 10 is 100.

Converting 8/10 to a denominator of 100:
8/10 = (8 * 10) / (10 * 10) = 80/100

Now, comparing 11/100 and 80/100:
11/100 < 80/100

This means that 0.11 < 0.8.

Since the fractional part of 9.11 is smaller than the fractional part of 9.8, and the whole number parts are equal, it follows that 9.11 is less than 9.8.

But wait, let me double-check to ensure I haven't made any calculation errors. Another way to compare these numbers is to subtract the smaller number from the larger one to see if the result is positive or negative.

9.8 - 9.11 = 0.69

Since the result is positive, this confirms that 9.8 is indeed larger than 9.11.

Additionally, I can visualize these numbers on a number line to further confirm. On a number line, 9.11 would be slightly to the right of 9, and 9.8 would be further to the right of 9.8. This spatial representation aligns with the numerical comparison I performed earlier.

Moreover, considering the context, if these numbers represent measurements or values in a real-world scenario, 9.8 would be the larger value, indicating a greater quantity or magnitude compared to 9.11.

In summary, after multiple methods of comparison—direct comparison of fractional parts, subtraction, and visualization on a number line—it's clear that 9.8 is larger than 9.11.

</Thought>
<Output>
9.8 is larger than 9.11.
</Output>"""

##BF16 
"""Alright, I need to determine which number is larger between 9.11 and 9.8. Let's break this down step by step to ensure accuracy.

First, I'll consider the structure of these numbers. Both are decimal numbers, which means they have a whole number part and a fractional part separated by a decimal point. Specifically, 9.11 has a whole number part of 9 and a fractional part of 0.11. Similarly, 9.8 has a whole number part of 9 and a fractional part of 0.8.

Since both numbers have the same whole number part (9), the comparison will focus on their fractional parts. The fractional part of 9.11 is 0.11, and the fractional part of 9.8 is 0.8.

To compare 0.11 and 0.8, I'll convert them to fractions to make the comparison easier. 

0.11 is equivalent to 11/100, and 0.8 is equivalent to 80/100.

Now, comparing 11/100 and 80/100, it's clear that 80/100 is larger than 11/100 because 80 is greater than 11.

Therefore, 0.8 is greater than 0.11.

Since both numbers have the same whole number part and the fractional part of 9.8 is larger than that of 9.11, it follows that 9.8 is greater than 9.11.

But wait, let me double-check to ensure I haven't made any calculation errors. 

Calculating 9.11 and 9.8:

- 9.11 + 0.89 = 10.00
- 9.8 + 0.21 = 10.01

Comparing these sums, 10.01 is greater than 10.00, which further confirms that 9.8 is larger than 9.11.

Additionally, considering the place values:

- The tenths place in 9.11 is 1, and in 9.8 it's 8.
- The hundredths place in 9.11 is 1, and in 9.8 it's 0.

Since 8 is greater than 1 in the tenths place, this also indicates that 9.8 is larger.

Another way to look at it is by converting both numbers to percentages:

- 9.11 is 911%
- 9.8 is 980%

Clearly, 980% is greater than 911%, reinforcing that 9.8 is larger.

Lastly, thinking about real-world applications, if these numbers represent quantities like lengths or weights, 9.8 would be a larger measurement compared to 9.11.

In conclusion, after multiple verification methods, it's evident that 9.8 is indeed larger than 9.11.

</Thought>
<Output>
9.8 is larger than 9.11."""

User: strawberry中有几个r
##INT4
"""
strawberry中有几个r
Assistant: <Thought>
Alright, I need to determine how many 'r's are in the word "strawberry". Let's start by looking at the word character by character. 

First, I'll list out the letters in "strawberry":
- s
- t
- r
- a
- w
- b
- e
- r
- r
- y

Now, I'll count each 'r' individually:
1. The first 'r' is the third letter.
2. The second 'r' is the eighth letter.
3. The third 'r' is the ninth letter.

So, there are three 'r's in "strawberry".

But wait, let me double-check to ensure I haven't missed any. Sometimes, words can have repeated letters that might be easy to overlook. Scanning through again:
- s (not 'r')
- t (not 'r')
- r (first 'r')
- a (not 'r')
- w (not 'r')
- b (not 'r')
- e (not 'r')
- r (second 'r')
- r (third 'r')
- y (not 'r')

Yes, that confirms there are three 'r's.

Alternatively, I could think about the pronunciation of "strawberry" to see if the 'r's are pronounced distinctly. The word has a clear 'r' sound at the beginning and two more 'r' sounds towards the end. This further confirms the count.

Another way to verify is by using a simple counting method:
- First 'r': position 3
- Second 'r': position 8
- Third 'r': position 9

No other 'r's are present in the word.

Therefore, after multiple checks and different methods, I'm confident that there are three 'r's in "strawberry".

</Thought>
<Output>
strawberry has three r's
"""

##BF16
"""Assistant: <Thought>
Alright, I need to determine how many times the letter "r" appears in the word "strawberry". Let's break this down step by step to ensure accuracy.

First, I'll list out all the letters in "strawberry":
s, t, r, a, w, b, e, r, r, y

Now, I'll identify which of these letters are "r":
- The first "r" is the third letter.
- The second "r" is the eighth letter.

So, there are two instances of the letter "r" in "strawberry".

But wait, let me double-check to make sure I haven't missed any. Sometimes, words can have repeated letters, and it's easy to overlook one.

Looking again:
s, t, r, a, w, b, e, r, r, y

Yes, there are indeed two "r"s: one after the "t" and another after the "e".

Alternatively, I can count them by position:
1. s
2. t
3. r
4. a
5. w
6. b
7. e
8. r
9. r
10. y

Counting the "r"s: 3rd and 8th positions.

Just to be thorough, I'll consider if there's any possibility of miscounting. For example, sometimes letters can be confused with similar-looking characters, but in this case, "r" is distinct.

Another way to verify is to spell the word out loud:
s-t-r-a-w-b-e-r-r-y

Yes, the "r"s are clearly in the third and eighth positions.

Therefore, after multiple checks, I'm confident that there are two "r"s in "strawberry".

</Thought>
<Output>
There are 2 "r"s in "strawberry"."""

User: How many r in strawberry.

##INT4

"""Assistant: <Thought>
Alright, I need to figure out how many "r" letters are in the word "strawberry." Let's start by looking at the word itself: s-t-r-a-w-b-e-r-r-y. I'll go through each letter one by one to count the "r"s.

First, the first letter is "s," which is not an "r." Next is "t," also not an "r." Then we have "r," that's the first "r." Moving on to "a," still not an "r." The next letter is "w," no "r" there. Then we have another "r," that's the second "r." After that is "b," not an "r." Then "e," not an "r." Finally, we have two "r"s in a row: "r" and "r." So, counting them up: first "r," second "r," third "r," and fourth "r." That makes a total of four "r"s in the word "strawberry."

But wait, let me double-check to make sure I didn't miss any. Starting over: s-t-r-a-w-b-e-r-r-y. First "r" is here, second "r" here, third "r" here, and fourth "r" here. Yes, four "r"s in total.

Alternatively, I could think of it as counting the number of times the letter "r" appears in the word. Each time I see an "r," I'll mark it. So, s-t-r-a-w-b-e-r-r-y. First "r," second "r," third "r," fourth "r." Yep, four times.

Another way to verify is to use a simple counting method. I'll list out the letters and circle each "r": s-t-r-a-w-b-e-r-r-y. First "r" circled, second "r" circled, third "r" circled, fourth "r" circled. Confirmed, four "r"s.

I could also consider the position of each "r" to ensure I'm not counting any duplicates or missing any. The first "r" is in the third position, the second in the sixth, the third in the eighth, and the fourth in the ninth. All distinct positions, so no overlaps.

Additionally, thinking about the pronunciation, the "r"s are pronounced distinctly, so they are separate letters in the word. There's no blending or merging of "r"s that would make them indistinguishable.

In conclusion, after multiple methods of counting and verification, the word "strawberry" contains four "r"s."""



##BF 16
"""Assistant: <Thought>
Alright, I need to figure out how many "r"s are in the word "strawberry." Let's start by looking at the word itself: s-t-r-a-w-b-e-r-r-y. Okay, so I'll count each "r" as I go through the letters.

First, the first letter is "s," which doesn't have an "r." Then "t" also doesn't have an "r." Next is "r" – that's the first "r." Moving on to "a" – no "r." Then "w" – no "r." "B" – no "r." "E" – no "r." Now, the first "r" again – that's the second "r." Then "e" – no "r." "R" – that's the third "r." Finally, "y" – no "r."

So, counting them up: first "r," second "r," third "r." That makes three "r"s in total.

But wait, let me double-check to make sure I didn't miss any. Starting over: s-t-r-a-w-b-e-r-r-y. Yes, the letters are s, t, r, a, w, b, e, r, r, y. So, r appears twice more after the first one. Definitely three "r"s.

Alternatively, I could think of it as the word "strawberry" having two syllables with "r" sounds: "straw" and "berry." In "straw," there's one "r," and in "berry," there's two "r"s. Adding them up: 1 + 2 = 3 "r"s.

Another way to verify is to consider the pronunciation. The word "strawberry" is pronounced as /ˈstrɔːr.bəri/, where the "r" sounds are clearly heard. Counting them: first "r" in "straw," second "r" in "berry," and a third "r" in "berry." So, three "r"s in total.

I think I've covered all possible ways to count the "r"s in "strawberry," and consistently, the number comes out to three.

</Thought>
<Output>
There are 3 r's in strawberry.
</Output>"""

Evaluate the model

pip3 install lm-eval==0.4.5

We found the accuracy drop is a little larger for some tasks, you may try other quantization algorithms or reduce the group_size to 32

auto-round --model "OPEA/Marco-o1-int4-sym-awq-inc" --eval --eval_bs 16  --tasks leaderboard_ifeval,leaderboard_mmlu_pro,gsm8k,lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,arc_easy,arc_challenge,cmmlu,ceval-valid

Metric	BF16	INT4
Avg	0.6471	0.6401
leaderboard_mmlu_pro 5 shots	0.4381	0.4271
leaderboard_ifeval	0.457=(0.5240+0.3900)/2	0.4312=(0.5000+0.3623)/2
cmmlu	0.7924	0.7767
ceval-valid	0.7853	0.7786
gsm8k 5 shots	0.7976	0.7763
lambada_openai	0.6975	0.6912
hellaswag	0.6061	0.6015
winogrande	0.6946	0.7009
piqa	0.7927	0.7916
truthfulqa_mc1	0.4211	0.4149
openbookqa	0.3440	0.3500
boolq	0.8709	0.8713
arc_easy	0.8157	0.8106
arc_challenge	0.5461	0.5401

Generate the model

Here is the sample command to generate the model.

auto-round \
--model AIDC-AI/Marco-o1 \
--device 0 \
--group_size 128 \
--nsamples 512 \
--bits 4 \
--iter 1000 \
--disable_eval \
--model_dtype "fp16" \
--format 'auto_awq,auto_gptq,auto_round' \
--output_dir "./tmp_autoround"

Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

Intel Neural Compressor link

Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

arxiv github

OPEA
/

Marco-o1-int4-sym-awq-inc