Cannot load model on SageMaker

#22
by jamie-relive - opened

I've just deployed the facebook/detr-resnet-50 model via the provided SageMaker SDK python script. I faced the same issue mentioned here so added the 'HF_MODEL_REVISION':'no_timm' parameter. However, when I try and make use of the SageMaker endpoint, I get the following error message:

An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary and could not load the entire response body

Looking through the CloudWatch logs I found these two messages:
Could not load model /.sagemaker/mms/models/facebook__detr-resnet-50.no_timm with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForObjectDetection'>, <class 'transformers.models.detr.modeling_detr.DetrForObjectDetection'>).
and
OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory /.sagemaker/mms/models/facebook__detr-resnet-50.no_timm.

Has anyone come across this or know how this can be resolved?

After further investigation, I realised that this model was created using v1 of PyTorch and not v2 which is what I had been trying to use. Using an older version of the Hugging Face inference container that runs PyTorch 1.13.1 loaded the model, but then gave a new error: Object of type ResNetConfig is not JSON serializable which meant I still could not use the model.

This seems to possibly be an issue with the transformers version that is included in this container. Ultimately, it doesn't look like there is a clear way to get this model hosted and working on SageMaker using this method.

Sign up or log in to comment