Running this model without quantization.

by Daaku-C5 - opened about 1 month ago

about 1 month ago

I'm trying to use this model on a VM that I have. Just a silly question: What is the minimum requirement to run this 8b model without quantization? I'm currently using a 24 gigs gpu.
ERROR:
CUDA out of memory. Tried to allocate 64.00 MiB. GPU 0 has a total capacity of 21.99 GiB of which 23.75 MiB is free.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment