How to I fine tune this model?
Hey there, I am interested in finetuning this bert-base-uncased
, how can I do it?
I found this tutorial https://huggingface.co./docs/transformers/training, but it focuses on finetuning a prediction head rather than the backbone weights.
I would like to
- finetune the backbone weights here, by dumping large corpus of texts from my domain,
- train a prediction head with a more limited dataset from my domain
is that possible?
Hey Ethan! Would love to chat on this if you have a few minutes to spare. Please let me know :)
Hey Ethan! Would love to chat on this if you have a few minutes to spare. Please let me know :)
@nikharanirghin ? what do you want to chat about?
Feedback on finetuning bert!
can you help me too ? I want to fine-tuning this model too
You can simply train the complete model with a very low learning rate to fine-tune the entire model.
When you load the pretrained model and set model.train() it will, by default, have all the layers enabled for back propagation.
IE:
def single_training_epoch(model, optimizer, train_dataloader):
model.train()
# Loop over the training set
for input_ids, attention_masks, labels in train_dataloader:
# Clear the gradients
optimizer.zero_grad()
# Forward pass
outputs = model(input_ids, attention_mask=attention_masks, labels=labels)
loss = outputs[0]
# Backward pass
loss.backward()
optimizer.step()
return model, optimizer
You can also manually set each layer (True = updates/trains, False = do not update during back-prop) using a loop:
for param in model.bert.parameters():
param.requires_grad = True
My recommendation from there is to save the model out and then train a new model taking the output from BERT as an input.
This enables a few things:
- WAY faster training/retraining of your 'prediction head' model as you can run the data through the previous model a single time and then train your smaller model.
- Easier to retrain and experiment with different architectures of your 'prediction head' without even interacting with the BERT model.
- Able to add additional values into your model (such as ints and floats that could be present in other data fields - or your features you've created yourself)
The one drawback is slightly slower inference time... but this can be mitigated by creating a proper pipeline (or a more advanced method would be to load them separately with their weights and merge the models together).
May I know What is the shape of the model
When train the I got this error
Target size (torch.Size([8, 6])) must be the same as input size (torch.Size([8, 2]))
I want to adjust the input shape with expected shape