Run inference on 2 GPUs

#112
by bweinstein123 - opened

Hi,

I have 2 RTX600 GPUs but I can't figure out how to run in the following way, on both gpus.

from transformers import AutoModelForCausalLM, AutoTokenizer
 
model_id = "mistralai/Mistral-7B-Instruct-v0.2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
model.half().cuda()

inputs = tokenizer(text, return_tensors="pt")
inputs_gpu = {key: value.to("cuda") for key, value in inputs.items()}

outputs = model.generate(**inputs_gpu, max_new_tokens=500)

Sign up or log in to comment