Running this model without quantization.

#9
by Daaku-C5 - opened

I'm trying to use this model on a VM that I have. Just a silly question: What is the minimum requirement to run this 8b model without quantization? I'm currently using a 24 gigs gpu.
ERROR:
CUDA out of memory. Tried to allocate 64.00 MiB. GPU 0 has a total capacity of 21.99 GiB of which 23.75 MiB is free.

Sign up or log in to comment