Abnormally Large Memory Footprint?
#2
by
RylanSchaeffer
- opened
I'm loading the model in torch_dtype=torch.float16,
but I'm finding that the memory footprint is 2-4x larger than comparable 7B and 8B language models. I also noticed that the return type is float32
. Is something converting the outputs into float32
and maybe causing the model to run in float32
?
I found the problem: "padding": 'max_length',
. The other 7B and 8B models were padded to the longest in the batch, not the tokenizer's max length.
Is your problem solved?