codellama/CodeLlama-7b-hf · Error in deployment in sagemaker

Aug 25, 2023

•

edited Aug 25, 2023

RuntimeError(f"weight {tensor_name} does not exist")
RuntimeError: weight model.layers.0.self_attn.rotary_emb.inv_freq does not exist
Error: ShardCannotStart

Thankyou in advance and please help in this deployment.

akgq

Aug 28, 2023

Facing the same issue Error: ShardCannotStart while deploying CodeLlama via hugging face

Choms

Aug 28, 2023

Hey, you need the mainline version of the 🤗 transformers from git to run this (https://huggingface.co./codellama/CodeLlama-7b-hf#model-use), there's no container for it yet on sagemaker (I guess you both are using the 0.9.3 container), you'll have to run it outside of sagemaker or load it on a notebook instance directly (that's what I'm doing for now, until this is supported)

Akanshu

Aug 28, 2023

I just made the notebook instance and inside that i have created a jupyter notebook and ran this code…can you please elaborate how to deploy.

Choms

Aug 28, 2023

That's about as far as I've got, I'm following the documentation here: https://huggingface.co./docs/transformers/main/model_doc/code_llama
plus the pip install from git on the readme from this model, then just use the notebook to play with it, as said there's no easy way to deploy it as an actual interference endpoint (you could build your own container with the required versions though), good luck!

PS: you can use thebloke's gptq build and run it on multi GPU if you pip install auto-gptq optimum