runtime error

s] model.safetensors: 62%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 5.48G/8.91G [00:50<00:26, 129MB/s] model.safetensors: 68%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 6.04G/8.91G [00:51<00:14, 192MB/s] model.safetensors: 71%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 6.34G/8.91G [00:55<00:19, 134MB/s] model.safetensors: 74%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– | 6.59G/8.91G [00:57<00:17, 132MB/s] model.safetensors: 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰ | 7.10G/8.91G [00:58<00:09, 187MB/s] model.safetensors: 83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž | 7.39G/8.91G [01:02<00:10, 145MB/s] model.safetensors: 86%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 7.62G/8.91G [01:04<00:09, 138MB/s] model.safetensors: 88%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š | 7.84G/8.91G [01:05<00:07, 151MB/s] model.safetensors: 93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 8.29G/8.91G [01:06<00:03, 203MB/s] model.safetensors: 96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 8.59G/8.91G [01:07<00:01, 223MB/s] model.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 8.91G/8.91G [01:07<00:00, 132MB/s] GPTBigCodeGPTQForCausalLM hasn't fused attention module yet, will skip inject fused attention. GPTBigCodeGPTQForCausalLM hasn't fused mlp module yet, will skip inject fused mlp. Traceback (most recent call last): File "/home/user/app/app.py", line 22, in <module> model = AutoGPTQForCausalLM.from_quantized(model_name_or_path, File "/home/user/.local/lib/python3.10/site-packages/auto_gptq/modeling/auto.py", line 119, in from_quantized return quant_func( File "/home/user/.local/lib/python3.10/site-packages/auto_gptq/modeling/_base.py", line 1036, in from_quantized model = autogptq_post_init(model, use_act_order=quantize_config.desc_act) File "/home/user/.local/lib/python3.10/site-packages/auto_gptq/modeling/_utils.py", line 380, in autogptq_post_init submodule.post_init(temp_dq = model.device_tensors[device]) File "/home/user/.local/lib/python3.10/site-packages/auto_gptq/nn_modules/qlinear/qlinear_exllamav2.py", line 140, in post_init assert self.qweight.device.type == "cuda" AssertionError

Container logs:

Fetching error logs...