How to set grad checkpointing?
#6
by
mactavish91
- opened
If not in use, the GPU memory usage is too high.
Could you check the guide here? https://huggingface.co./docs/transformers/main/en/perf_train_gpu_one#gradient-checkpointing