Only 4G memory usage when inferring while 38Go when training

#15
by hayj - opened

Is it normal it takes much more GPU mem when training, or am I wrongly using it?
I use a Nvidia A100.

Yes, this is normal. During training, it needs to store optimizer states, intermediate activations, and some other stuff, which are several times larger than the model weights.

Please refer to https://huggingface.co./docs/transformers/v4.20.1/en/perf_train_gpu_one#anatomy-of-models-memory for more details.

Sign up or log in to comment