Error when deploying model in inference or amazon sagemaker endpoint
The code as it is suggested for deploying the model in Amazon SageMaker or on the inference endpoint is not working.
There are errors about shards or tokenizer file.
Hi @iuf26 ,
Below are the possible reasons for an above issue:
Model Sharding Issue
which means the model weights are split across multiple shards because of its large size. If the environment where the model is being deployed doesn't properly load all shards, we willl encounter errors.
To avoid this problem, make sure that all shard files (model-00001-of-00002.bin, model-00002-of-00002.bin, etc.) are uploaded to the same directory in the storage location (S3 for SageMaker).Tokenizer File Issue
which means the tokenizer configuration file (tokenizer.json, tokenizer_config.json) may be missing or incorrectly referenced in your code or deployment setup.
To avoid it, make sure that the tokenizer files are included in the model directory.And also make sure that use the
device_map
parameter (e.g., "auto") to manage large models efficiently during loading. Ensure sufficient memory is allocated on the endpoint instance.
If the issue still persists, could you please share screenshots of error message then will help in an better way.
Thank you.
Hi
@GopiUppari
,
Thank you for your answer above
The problem is that Paligemma exists only in newer Transformer versions, and SageMaker uses an image that is not updated (it only recognizes Transformers 4.37 as the latest version). When you deploy the model using HuggingfaceModel at inference time, you encounter an error stating that the model type 'paligemma' is not recognized.
To deploy this model, you would need a custom ECR image for inference that uses a newer version of Transformers (I used transformers==4.47.0).