ybelkada/llama-7b-GPTQ-test · How was this PEFT model merged with the base?

Thanks for putting this together, and the colab sheet.

I tried to merge and unload so I could push a full model, to hub, but I'm getting this error:

Cannot merge LORA layers when the model is gptq quantized

after trying:

model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto") # must be auto, cannot be cpu

from peft import PeftModel

# load PEFT model with new adapters
model = PeftModel.from_pretrained(
    model,
    adapter_model_name,
)

model = model.merge_and_unload() # merge adapters with the base model.