Haihao Shen's picture

Haihao Shen

Haihao

·

https://github.com/intel/auto-round

AI & ML interests

LLM quantization, sparsity, and acceleration

Recent Activity

reacted to wenhuach's post with 🚀 8 days ago

This week, OPEA Space released several new INT4 models, including: nvidia/Llama-3.1-Nemotron-70B-Instruct-HF allenai/OLMo-2-1124-13B-Instruct THUDM/glm-4v-9b AIDC-AI/Marco-o1 and several others. Let us know which models you'd like prioritized for quantization, and we'll do our best to make it happen! https://huggingface.co./OPEA

authored a paper 21 days ago

A dynamic parallel method for performance optimization on hybrid CPUs

upvoted a paper 21 days ago

A dynamic parallel method for performance optimization on hybrid CPUs

View all activity

Articles

Building Cost-Efficient Enterprise RAG applications with Intel Gaudi 2 and Intel Xeon

Accelerate StarCoder with 🤗 Optimum Intel on Xeon: Q8/Q4 and Speculative Decoding

Organizations

Haihao's activity

reacted to wenhuach's post with 🚀 8 days ago

Post

329

This week, OPEA Space released several new INT4 models, including:
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
allenai/OLMo-2-1124-13B-Instruct
THUDM/glm-4v-9b
AIDC-AI/Marco-o1
and several others.
Let us know which models you'd like prioritized for quantization, and we'll do our best to make it happen!

https://huggingface.co./OPEA

3 replies

·

authored a paper 21 days ago

A dynamic parallel method for performance optimization on hybrid CPUs

Paper • 2411.19542 • Published 26 days ago • 5

upvoted a paper 21 days ago

A dynamic parallel method for performance optimization on hybrid CPUs

Paper • 2411.19542 • Published 26 days ago • 5

commented a paper 21 days ago

A dynamic parallel method for performance optimization on hybrid CPUs

Paper • 2411.19542 • Published 26 days ago • 5 •

liked a model 27 days ago

OPEA/Meta-Llama-3.1-70B-Instruct-int4-asym-inc

Updated 3 days ago • 28 • 1

New activity in Intel/neural-chat-7b-v3 about 1 month ago

Adding `safetensors` variant of this model

#9 opened about 1 month ago by

New activity in Intel/neural-chat-7b-v3-3 about 1 month ago

Adding `safetensors` variant of this model

#14 opened 2 months ago by

authored 3 papers 3 months ago

Efficient LLM Inference on CPUs

Paper • 2311.00502 • Published Nov 1, 2023 • 7

Effective Quantization for Diffusion Models on CPUs

Paper • 2311.16133 • Published Nov 2, 2023 • 4

Fast DistilBERT on CPUs

Paper • 2211.07715 • Published Oct 27, 2022

upvoted a paper 3 months ago

Effective Quantization for Diffusion Models on CPUs

Paper • 2311.16133 • Published Nov 2, 2023 • 4

upvoted an article 4 months ago

Article

Building Cost-Efficient Enterprise RAG applications with Intel Gaudi 2 and Intel Xeon

May 9

• 12

liked a model 5 months ago

Intel/Meta-Llama-3.1-8B-Instruct-int4-inc

Updated 27 days ago • 2

liked a model 6 months ago

Intel/neural-embedding-v1

Updated Jun 26 • 13

upvoted an article 7 months ago

Article

Accelerate StarCoder with 🤗 Optimum Intel on Xeon: Q8/Q4 and Speculative Decoding

Jan 30

• 9

liked a Space 7 months ago

Shanghainese TTS

liked a Space 8 months ago

Low-bit Quantized Open LLM Leaderboard

Track, rank and evaluate open LLMs and chatbots

liked a model 9 months ago

Intel/phi-2-int4-inc

Text Generation • Updated Oct 22 • 14 • 2

liked a Space 9 months ago

Running on CPU Upgrade

HHEM Leaderboard

commented a paper 10 months ago

Efficient Post-training Quantization with FP8 Formats

Paper • 2309.14592 • Published Sep 26, 2023 • 10 •