The example code in README.md might be wrong.

#3
by zhaokun - opened

Hi, thank you for your effort to make the quantization model.
I manually download the models in the local directory via huggingface_cli and copy the code from the readme. I just change the MODEL_NAME like MODEL_NAME = "/home/ubuntu/tmp/joycaption2_nf4". When I run the code, the code throws an exception. I have no clue how to fix it, please help. My environment is:

torch                     2.5.1+cu121
torchaudio                2.5.1+cu121
torchvision               0.20.1+cu121
transformers              4.44.0

the program output:

Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:02<00:00,  1.23s/it]
The `seen_tokens` attribute is deprecated and will be removed in v4.41. Use the `cache_position` model input instead.
Traceback (most recent call last):
  File "/home/ubuntu/tmp/abc.py", line 67, in <module>
    generate_ids = llava_model.generate(input_ids=input_ids.to('cuda'), pixel_values=pixel_values.to('cuda'), attention_mask=attention_mask.to('cuda'), max_new_tokens=300, do_sample=True, suppress_tokens=None, use_cache=True)[0]
  File "/home/ubuntu/anaconda3/envs/lora-trainer/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/lora-trainer/lib/python3.10/site-packages/transformers/generation/utils.py", line 2024, in generate
    result = self._sample(
  File "/home/ubuntu/anaconda3/envs/lora-trainer/lib/python3.10/site-packages/transformers/generation/utils.py", line 2982, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File "/home/ubuntu/anaconda3/envs/lora-trainer/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/lora-trainer/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/lora-trainer/lib/python3.10/site-packages/accelerate/hooks.py", line 169, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/lora-trainer/lib/python3.10/site-packages/transformers/models/llava/modeling_llava.py", line 428, in forward
    image_outputs = self.vision_tower(pixel_values, output_hidden_states=True)
  File "/home/ubuntu/anaconda3/envs/lora-trainer/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/lora-trainer/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/lora-trainer/lib/python3.10/site-packages/accelerate/hooks.py", line 169, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/lora-trainer/lib/python3.10/site-packages/transformers/models/siglip/modeling_siglip.py", line 1188, in forward
    return self.vision_model(
  File "/home/ubuntu/anaconda3/envs/lora-trainer/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/lora-trainer/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/lora-trainer/lib/python3.10/site-packages/accelerate/hooks.py", line 169, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/lora-trainer/lib/python3.10/site-packages/transformers/models/siglip/modeling_siglip.py", line 1099, in forward
    pooler_output = self.head(last_hidden_state) if self.use_head else None
  File "/home/ubuntu/anaconda3/envs/lora-trainer/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/lora-trainer/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/lora-trainer/lib/python3.10/site-packages/accelerate/hooks.py", line 169, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/lora-trainer/lib/python3.10/site-packages/transformers/models/siglip/modeling_siglip.py", line 1126, in forward
    hidden_state = self.attention(probe, hidden_state, hidden_state)[0]
  File "/home/ubuntu/anaconda3/envs/lora-trainer/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/lora-trainer/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/lora-trainer/lib/python3.10/site-packages/accelerate/hooks.py", line 169, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/ubuntu/anaconda3/envs/lora-trainer/lib/python3.10/site-packages/torch/nn/modules/activation.py", line 1368, in forward
    attn_output, attn_output_weights = F.multi_head_attention_forward(
  File "/home/ubuntu/anaconda3/envs/lora-trainer/lib/python3.10/site-packages/torch/nn/functional.py", line 6251, in multi_head_attention_forward
    attn_output = linear(attn_output, out_proj_weight, out_proj_bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x1152 and 1x331776)

Thanks for the report. As it stands, it doesn't work well with NF4. It's an issue that's being worked on at JoyCaption's github, and it's thought to be an error in both bitsandbytes and PEFT.πŸ˜“

As for other quantization methods, the following quantization may work because it is not related to bitsandbytes.
https://huggingface.co./OPEA/llama-joycaption-alpha-two-hf-llava-int4-sym-inc

Thank you for your patient explanation. I'll try the int4 quantization model

Sign up or log in to comment