bert-large-uncased-swag

This model is a fine-tuned version of google-bert/bert-large-uncased on SWAG dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4643
  • Accuracy: 0.8295

Model description

Intended uses & limitations

This model should be used as an expert in the Meteor-of-LoRA framework.

Training and evaluation data

The data were splitted based on HuggingFace default dataset:

dataset = load_dataset("swag")

Training procedure

Our approach focuses explicitly on adapting the Transformers weights' Wq (query) and Wv (value) in the attention module for parameter efficiency.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss Accuracy
1.2132 0.1088 500 0.8717 0.6959
0.908 0.2175 1000 0.7149 0.7473
0.8353 0.3263 1500 0.6474 0.7575
0.8075 0.4351 2000 0.6142 0.7798
0.8011 0.5438 2500 0.5785 0.7867
0.7727 0.6526 3000 0.5643 0.7936
0.7647 0.7614 3500 0.5698 0.7956
0.7731 0.8701 4000 0.5453 0.8011
0.7489 0.9789 4500 0.5336 0.8052
0.7496 1.0877 5000 0.5431 0.8033
0.735 1.1964 5500 0.5231 0.8083
0.7194 1.3052 6000 0.5147 0.8096
0.7307 1.4140 6500 0.5102 0.8112
0.7355 1.5227 7000 0.5223 0.8133
0.7085 1.6315 7500 0.5054 0.8142
0.7206 1.7403 8000 0.5026 0.8157
0.7143 1.8490 8500 0.5126 0.8144
0.7045 1.9578 9000 0.5035 0.8162
0.6972 2.0666 9500 0.4948 0.8178
0.6885 2.1753 10000 0.4890 0.8202
0.7079 2.2841 10500 0.4910 0.8193
0.6874 2.3929 11000 0.4907 0.8222
0.6832 2.5016 11500 0.4875 0.8217
0.6807 2.6104 12000 0.4824 0.8224
0.6865 2.7192 12500 0.4877 0.8227
0.6863 2.8279 13000 0.4821 0.8232
0.6913 2.9367 13500 0.4914 0.8229
0.6996 3.0455 14000 0.4843 0.8241
0.687 3.1542 14500 0.4753 0.8250
0.6896 3.2630 15000 0.4762 0.8251
0.6745 3.3718 15500 0.4753 0.8242
0.6735 3.4805 16000 0.4713 0.8267
0.6764 3.5893 16500 0.4715 0.8259
0.6521 3.6981 17000 0.4669 0.8285
0.6686 3.8068 17500 0.4726 0.8269
0.6721 3.9156 18000 0.4703 0.8273
0.6682 4.0244 18500 0.4660 0.8274
0.6533 4.1331 19000 0.4690 0.8281
0.6547 4.2419 19500 0.4697 0.8282
0.6589 4.3507 20000 0.4640 0.8291
0.6518 4.4594 20500 0.4638 0.8294
0.6739 4.5682 21000 0.4669 0.8285
0.6763 4.6770 21500 0.4628 0.8304
0.6503 4.7857 22000 0.4640 0.8296
0.6659 4.8945 22500 0.4643 0.8295

Framework versions

  • PEFT 0.12.1.dev0
  • Transformers 4.45.0.dev0
  • Pytorch 2.3.1+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1
Downloads last month
0
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for yefo-ufpe/bert-large-uncased-swag

Adapter
(10)
this model