wenhua cheng

wenhuach

AI & ML interests

Model Compression, CV

Recent Activity

Organizations

Intel's profile picture Need4Speed's profile picture Qwen's profile picture

wenhuach's activity

reacted to their post with ๐Ÿš€ 6 days ago
view post
Post
2445
Check out [DeepSeek-R1 INT2 model( OPEA/DeepSeek-R1-int2-mixed-sym-inc). This 200GB DeepSeek-R1 model shows only about a 2% drop in MMLU, though it's quite slow due to kernel issue.

| | BF16 | INT2-mixed |
| ------------- | ------ | ---------- |
| mmlu | 0.8514 | 0.8302 |
| hellaswag | 0.6935 | 0.6657 |
| winogrande | 0.7932 | 0.7940 |
| arc_challenge | 0.6212 | 0.6084 |
posted an update 7 days ago
view post
Post
2445
Check out [DeepSeek-R1 INT2 model( OPEA/DeepSeek-R1-int2-mixed-sym-inc). This 200GB DeepSeek-R1 model shows only about a 2% drop in MMLU, though it's quite slow due to kernel issue.

| | BF16 | INT2-mixed |
| ------------- | ------ | ---------- |
| mmlu | 0.8514 | 0.8302 |
| hellaswag | 0.6935 | 0.6657 |
| winogrande | 0.7932 | 0.7940 |
| arc_challenge | 0.6212 | 0.6084 |
posted an update 17 days ago
replied to their post 2 months ago
view reply

While that may be one reason, it doesn't fully explain why there are still many quantized models available for LLaMA 3.1 and LLaMA 3.3.

reacted to their post with ๐Ÿš€ 3 months ago
posted an update 3 months ago
replied to their post 3 months ago
view reply

You can try using auto-round-fast xxx for a slight accuracy drop, or auto-round-fast xxx --nsamples 1 --iters 1 for very fast execution without algorithm tuning.

replied to their post 3 months ago
view reply

Thank you for your suggestion. As our focus is on algorithm development and our computational resources are limited, we currently lack the bandwidth to support a large number of models. If you come across any models that would benefit from quantization, feel free to comment on any models under OPEA. We will make an effort to prioritize and quantize them if resources allow.

reacted to their post with ๐Ÿ”ฅ๐Ÿ‘€ 3 months ago
view post
Post
1824
AutoRound has demonstrated strong results even at 2-bit precision for VLM models like QWEN2-VL-72B. Check it out here: OPEA/Qwen2-VL-72B-Instruct-int2-sym-inc.
  • 4 replies
ยท
posted an update 3 months ago
view post
Post
1824
AutoRound has demonstrated strong results even at 2-bit precision for VLM models like QWEN2-VL-72B. Check it out here: OPEA/Qwen2-VL-72B-Instruct-int2-sym-inc.
  • 4 replies
ยท
reacted to their post with โค๏ธ 3 months ago
view post
Post
345
This week, OPEA Space released several new INT4 models, including:
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
allenai/OLMo-2-1124-13B-Instruct
THUDM/glm-4v-9b
AIDC-AI/Marco-o1
and several others.
Let us know which models you'd like prioritized for quantization, and we'll do our best to make it happen!

https://huggingface.co./OPEA
  • 3 replies
ยท
replied to their post 3 months ago
posted an update 3 months ago
view post
Post
345
This week, OPEA Space released several new INT4 models, including:
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
allenai/OLMo-2-1124-13B-Instruct
THUDM/glm-4v-9b
AIDC-AI/Marco-o1
and several others.
Let us know which models you'd like prioritized for quantization, and we'll do our best to make it happen!

https://huggingface.co./OPEA
  • 3 replies
ยท
New activity in OPEA/glm-4-9b-chat-int4-sym-inc 3 months ago

Update README.md

#1 opened 3 months ago by
wenhuach