How to transform our Llama object to GPU

by arshiahemmat - opened Jul 21, 2024

Jul 21, 2024

Hey Guys!

Thanks for your amazing implementation!
I wanna transform my model to the GPU but neither ".to(cuda)" nor "device = cuda" are not working (I could run it on CPU)!

from llama_cpp import Llama

llm = Llama(
model_path="Dorna-Llama3-8B-Instruct-GGUF/dorna-llama3-8b-instruct.Q8_0.gguf",
chat_format="llama-3",
n_gpu_layers=-1,
n_ctx=2048,
)

So, could you please give me some tips on properly doing this task?

Thanks for your effort and time!

smasadifar

Aug 7, 2024

•

edited Aug 7, 2024

Hello friends
I have the same problem as above (arshiahemmat comment), my code does not run with GPU, so the response time is high.
Please reply to this comment.

MiladMola

Part DP AI org Nov 25, 2024

Hi!
Please check this https://github.com/abetlen/llama-cpp-python/issues/576
You can use ollama (https://ollama.com/).

mohalisad changed discussion status to closed Nov 25, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment