Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
ngxsonΒ 
posted an update 13 days ago
Post
2120
Check out my collection of pre-made GGUF LoRA adapters!

This allow you to use both normal + abliterated version of popular models like llama, qwen, etc, without having to double to amount of VRAM usage.

ngxson/gguf_lora_collection

Tagging @bartowski @MaziyarPanahi and @mradermacher , you may want to give this a try!

With my llama-cpp-python (0.3.4), the following PR maybe have not been merged yet, so an error occurs when applying LoRA. I tried it with Qwen 2.5 14B Instruct. Well, it will be updated eventually.πŸ™„
https://github.com/ggerganov/llama.cpp/issues/9114

This is super cool!!! Would you mind sharing the process of these GGUF LoRA adapters? Did you convert the LoRA into GGUF or made LoRA from the GGUF itself?

Β·

Yes, sure!

The first step is to generate the PEFT-compatible LoRA adapter, I used mergekit-extract-lora to do that. Please note that some bigger models (Qwen/Llama 70B) give some errors that I don't know how to fix, hopefully they will fix that soon. You can find more info about mergekit here: https://github.com/arcee-ai/mergekit

Next step is to convert PEFT to GGUF, I used this space: https://huggingface.co./spaces/ggml-org/gguf-my-lora

Then it's good to go!

Please note that, the space can convert any PEFT LoRA adapters to GGUF, so if you're using something like unsloth, it will be straight-forward to convert into GGUF LoRA (so no need to merge to base model)