Abnormally Large Memory Footprint?

#2
by RylanSchaeffer - opened

I'm loading the model in torch_dtype=torch.float16, but I'm finding that the memory footprint is 2-4x larger than comparable 7B and 8B language models. I also noticed that the return type is float32. Is something converting the outputs into float32 and maybe causing the model to run in float32?

I found the problem: "padding": 'max_length', . The other 7B and 8B models were padded to the longest in the batch, not the tokenizer's max length.

Owner

Is your problem solved?

Sign up or log in to comment