google/paligemma2-3b-pt-224 · Error when deploying model in inference or amazon sagemaker endpoint

16 days ago

The code as it is suggested for deploying the model in Amazon SageMaker or on the inference endpoint is not working.
There are errors about shards or tokenizer file.

GopiUppari

Google org 13 days ago

Hi @iuf26 ,

Below are the possible reasons for an above issue:

Model Sharding Issue which means the model weights are split across multiple shards because of its large size. If the environment where the model is being deployed doesn't properly load all shards, we willl encounter errors.
To avoid this problem, make sure that all shard files (model-00001-of-00002.bin, model-00002-of-00002.bin, etc.) are uploaded to the same directory in the storage location (S3 for SageMaker).
Tokenizer File Issue which means the tokenizer configuration file (tokenizer.json, tokenizer_config.json) may be missing or incorrectly referenced in your code or deployment setup.
To avoid it, make sure that the tokenizer files are included in the model directory.
And also make sure that use the device_map parameter (e.g., "auto") to manage large models efficiently during loading. Ensure sufficient memory is allocated on the endpoint instance.

If the issue still persists, could you please share screenshots of error message then will help in an better way.

Thank you.

iuf26

1 day ago

•

edited 1 day ago

Hi @GopiUppari ,
Thank you for your answer above

The problem is that Paligemma exists only in newer Transformer versions, and SageMaker uses an image that is not updated (it only recognizes Transformers 4.37 as the latest version). When you deploy the model using HuggingfaceModel at inference time, you encounter an error stating that the model type 'paligemma' is not recognized.

To deploy this model, you would need a custom ECR image for inference that uses a newer version of Transformers (I used transformers==4.47.0).