Spaces:
Sleeping
Sleeping
Running Llama3-8B-1.58-100B-tokens on CPU
#2
by
chiauho
- opened
Hi, the example given on how to use the model still load and run it on GPU. How can I run these on CPU? Thanks for any pointers
Hi, sorry for the late reply. It can run on cpu but it's slow due to the unpacking logic, so it's advisable to run it on gpu, but to run it on cpu just specify that in the device_map : device_map="cpu"
Ok, thank you very much.
I was hoping that 1bit model like this will be able to run on cpu without gpu. Even run on ARM.
If you are interested, check out this space, it uses bitnet.cpp to run the model on cpu, and it's much faster : https://huggingface.co./spaces/medmekk/BitNet.cpp