Memory usage
#1
by
HoagyD
- opened
I'm trying to load this on to two 3090s and it is OOMing, how much memory did you need to be able to load this?
@HoagyD sorry about that! Use the updated command, using 32k tokens does error out on 48gb.
This is the only solution I have found since all the 4bits I tried kinda act really dumb
You can also set the cache dtype in vllm and fit 32k tokens, but it comes at the cost of huge over head.
Thanks!
HoagyD
changed discussion status to
closed