Error in deployment in sagemaker

#7
by Akanshu - opened

RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight model.layers.0.self_attn.rotary_emb.inv_freq does not exist
Error: ShardCannotStart

Screenshot 2023-08-25 at 6.33.45 PM.png
Thankyou in advance and please help in this deployment.

Facing the same issue Error: ShardCannotStart while deploying CodeLlama via hugging face

Hey, you need the mainline version of the 🤗 transformers from git to run this (https://huggingface.co./codellama/CodeLlama-7b-hf#model-use), there's no container for it yet on sagemaker (I guess you both are using the 0.9.3 container), you'll have to run it outside of sagemaker or load it on a notebook instance directly (that's what I'm doing for now, until this is supported)

I just made the notebook instance and inside that i have created a jupyter notebook and ran this code…can you please elaborate how to deploy.

That's about as far as I've got, I'm following the documentation here: https://huggingface.co./docs/transformers/main/model_doc/code_llama
plus the pip install from git on the readme from this model, then just use the notebook to play with it, as said there's no easy way to deploy it as an actual interference endpoint (you could build your own container with the required versions though), good luck!

PS: you can use thebloke's gptq build and run it on multi GPU if you pip install auto-gptq optimum

Sign up or log in to comment