How to transform our Llama object to GPU
#1
by
arshiahemmat
- opened
Hey Guys!
Thanks for your amazing implementation!
I wanna transform my model to the GPU but neither ".to(cuda)" nor "device = cuda" are not working (I could run it on CPU)!
from llama_cpp import Llama
llm = Llama(
model_path="Dorna-Llama3-8B-Instruct-GGUF/dorna-llama3-8b-instruct.Q8_0.gguf",
chat_format="llama-3",
n_gpu_layers=-1,
n_ctx=2048,
)
So, could you please give me some tips on properly doing this task?
Thanks for your effort and time!
Hello friends
I have the same problem as above (arshiahemmat comment), my code does not run with GPU, so the response time is high.
Please reply to this comment.
Hi!
Please check this https://github.com/abetlen/llama-cpp-python/issues/576
You can use ollama (https://ollama.com/).
mohalisad
changed discussion status to
closed