Multi-GPU Inference Error (Expected all tensors to be on the same device, but found at least two devices; -sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed)
#3
by
MikeZhang0701
- opened
Hi!
I inference the AI4Chem/ChemVLM-26B
on four NVIDIA 4090 GPUs, I load and use the model as follows:
model = AutoModel.from_pretrained(
path,
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
trust_remote_code=True,
device_map='auto').eval()
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
# set the max number of tiles in `max_num`
pixel_values = load_image('./examples/image1.jpg', max_num=6).to(torch.bfloat16).cuda()
generation_config = dict(
num_beams=1,
max_new_tokens=512,
do_sample=False,
)
# single-round single-image conversation
question = "请详细描述图片" # Please describe the picture in detail
response = model.chat(tokenizer, pixel_values, question, generation_config)
print(question, response)
However, it gives the error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:3!
How can I solve the problem?
Moreover, I have referenced to the Issues from InternVL, and try to modify the device_map
when loading model as follows:
device_map = {
'vision_model': 0,
'mlp1': 0,
'language_model.model.tok_embeddings': 0,
'language_model.model.norm': 0,
'language_model.output.weight': 0
}
for i in range(16):
device_map[f'language_model.model.layers.{i}'] = 1
for i in range(16, 32):
device_map[f'language_model.model.layers.{i}'] = 2
for i in range(32, 48):
device_map[f'language_model.model.layers.{i}'] = 3
print(device_map)
# device_map = 'auto'
model = AutoModel.from_pretrained(
path,
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
trust_remote_code=True,
device_map=device_map).eval()
However, it gives another error:
/opt/conda/conda-bld/pytorch_1724789172399/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [114,0,0], thread: [95,0,0] Assertion `
-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
Could you please have a look on this problem? Especially how to load and use the model on multi-GPU.
Thank you very much!
Have you ever tried accelerate or deepspeed for multi-gpu? Also, when evaluation, single RTX 4090 can work.