General discussion.
Call me clueless but I swore there were at least some general prebuilt executables for Linux in the regular llama.cpp releases, well, so it's all MacOS and Windows. My day is ruined.
Can't blame them, too much overhead when people who use linux should already know how to build their own packages.
So I didn't want to believe this...
After some testing, making the actual quants is really slow, recommended to only use it for the intial FP16 GGUF and imatrix.dat generation.
...because I was thinking that "it can't be that bad".
If anyone also thought that, well...
It actually is very slow. I don't want to imagine what quanting the new smaller stuff like IQ3/2 would look like. I used free Colab but I don't think that would scale.
But!
It's really not a bad solution if you need to generate the Imatrix data and don't have the hardware for it. That is pretty fast as it's GPU-bound.
The script is broken because of upstream changes.
I don't have time to fix it, at the moment.
This is not a good way to do it.
- Colab has limited storage space for GPU instances
- Colab only has 2 CPU cores
Recommended to do it locally or on another cloud provider. (paid colab isn't great)
The script is broken because of upstream changes.
I don't have time to fix it, at the moment.
This is not a good way to do it.
- Colab has limited storage space for GPU instances
- Colab only has 2 CPU cores
Recommended to do it locally or on another cloud provider. (paid colab isn't great)
I probably do it locally but I've got to figure how to do it π
If you are on windows give https://huggingface.co./FantasiaFoundry/GGUF-Quantization-Script a try.
If you are on windows give https://huggingface.co./FantasiaFoundry/GGUF-Quantization-Script a try.
I am on Linux
You will have to compile llamacpp from source.
I was gone for a couple of months, I am also unsure how to do it now.
Some of the build flags changed and I can't get it to compile with cuda. (most likely due to me being on arch)
When I get it working, I will see if I can add support for linux to the script.
It will take a while however, I am not a coder, and I don't have much free time.
You will have to compile llamacpp from source.
I was gone for a couple of months, I am also unsure how to do it now.
Some of the build flags changed and I can't get it to compile with cuda. (most likely due to me being on arch)When I get it working, I will see if I can add support for linux to the script.
It will take a while however, I am not a coder, and I don't have much free time.
I'm also on arch which looks like we are in the same dilemma π
I think it is either gcc or the new 555 nvidia drivers :|
If I figure it out I'll let you know
got this error running the notebook
They have changed the naming for a few things, I did some changes that reflected that in the Windows script, you should be good to start there as reference, convert script changed to underlines instead of hifens, the executables received a llama-
prefix:
I think Llama.cpp now provides pre-built Linux binaries? They are tagged ubuntu
so I'm imagining they are expected to be used for servers using it... I'm not too familiar with the Linux side of things or about the broader compatibility situation across the variations, my experience is basically just Ubuntu server side.
I finally got time to figure out the issue
You need to set your cuda architecture
Example
make -j 16 GGML_CUDA=1 CUDA_POWER_ARCH=75
Never mind you just need to run make -j 8 GGML_CUDA=1
I added Linux support to the script
https://huggingface.co./FantasiaFoundry/GGUF-Quantization-Script/discussions/36