processor = PaliGemmaProcessor.from_pretrained(model_id) issue

#3
by lvfengchun - opened

Traceback (most recent call last):
File "/Disk/lfc/paligemma2/inference.py", line 13, in
processor = PaliGemmaProcessor.from_pretrained(model_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Disk/Miniforge3/envs/lfc_test/lib/python3.11/site-packages/transformers/processing_utils.py", line 892, in from_pretrained
args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Disk/Miniforge3/envs/lfc_test/lib/python3.11/site-packages/transformers/processing_utils.py", line 938, in _get_arguments_from_pretrained
args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Disk/Miniforge3/envs/lfc_test/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2271, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "/Disk/Miniforge3/envs/lfc_test/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2505, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Disk/Miniforge3/envs/lfc_test/lib/python3.11/site-packages/transformers/models/gemma/tokenization_gemma_fast.py", line 103, in init
super().init(
File "/Disk/Miniforge3/envs/lfc_test/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py", line 115, in init
fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: data did not match any variant of untagged enum ModelWrapper at line 2591977 column 3

Hi @lvfengchun ,

I didn't encounter an error, could you please refer to this gist file.

Getting an error because the PaliGemmaProcessor is unable to load the tokenizer due to an issue with the tokenizer file (tokenizer.json) got corrupted, incompatible, or incorrectly formatted.

To solve this issue, please make sure that use the PaliGemmaProcessor and model from the same checkpoint.

model_id = "google/paligemma2-3b-pt-896"
processor = PaliGemmaProcessor.from_pretrained(model_id)
model = PaliGemmaForConditionalGeneration.from_pretrained(model_id)

Suppose, if your working on your local system then delete the local cache of the tokenizer files for the model and redownload them.

If you still persists an issue, please let me know.

Thank you.

Google org

This is most likely due to an outdated version of transformers/tokenizers. Upgrading should fix the issue!

Hi @GopiUppari I loaded the model and tokenizer locally, and I have re-downloaded the tokenizer.json file, but I still have this error.

@Xenova transformers==4.47.0 it does work for me
thanks

Google org

Happy to help!

Xenova changed discussion status to closed

Sign up or log in to comment