Spaces:

HF1BitLLM
/

README

Sleeping

Running Llama3-8B-1.58-100B-tokens on CPU

by chiauho - opened Oct 23

Hugging Face 1Bit LLMs org Oct 23

Hi, the example given on how to use the model still load and run it on GPU. How can I run these on CPU? Thanks for any pointers

medmekk

Hugging Face 1Bit LLMs org Nov 4

Hi, sorry for the late reply. It can run on cpu but it's slow due to the unpacking logic, so it's advisable to run it on gpu, but to run it on cpu just specify that in the device_map : device_map="cpu"

chiauho

Hugging Face 1Bit LLMs org Nov 4

Ok, thank you very much.

I was hoping that 1bit model like this will be able to run on cpu without gpu. Even run on ARM.

medmekk

Hugging Face 1Bit LLMs org Nov 4

If you are interested, check out this space, it uses bitnet.cpp to run the model on cpu, and it's much faster : https://huggingface.co./spaces/medmekk/BitNet.cpp

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment