Running this model without quantization.
#9
by
Daaku-C5
- opened
I'm trying to use this model on a VM that I have. Just a silly question: What is the minimum requirement to run this 8b model without quantization? I'm currently using a 24 gigs gpu.
ERROR:
CUDA out of memory. Tried to allocate 64.00 MiB. GPU 0 has a total capacity of 21.99 GiB of which 23.75 MiB is free.