Couldn't train on my 12 GB GPU even with a batch size of 1 since the model is too big I think.