image-captioning-Vit-GPT2-Flickr8k

This model is a fine-tuned version of nlpconnect/vit-gpt2-image-captioning on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4624
  • Rouge1: 38.4609
  • Rouge2: 14.1268
  • Rougel: 35.4304
  • Rougelsum: 35.391
  • Gen Len: 12.1355

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
0.5495 0.06 500 0.4942 35.0812 11.7357 32.4228 32.4251 11.5738
0.4945 0.12 1000 0.4903 35.4943 12.0207 32.8571 32.8486 11.8682
0.4984 0.19 1500 0.4862 35.3652 11.9707 32.8296 32.8126 12.0544
0.4783 0.25 2000 0.4808 36.1048 12.3597 33.4635 33.4504 11.3468
0.4736 0.31 2500 0.4772 35.9342 12.343 33.519 33.495 11.1066
0.4685 0.37 3000 0.4708 36.8985 13.0743 34.3294 34.2978 11.4739
0.4687 0.43 3500 0.4704 36.1934 12.5721 33.4731 33.4671 11.9201
0.4709 0.49 4000 0.4696 36.1822 12.8306 33.4001 33.3673 12.1733
0.4575 0.56 4500 0.4675 37.4471 13.7553 34.5655 34.5384 12.6302
0.4484 0.62 5000 0.4662 36.6786 13.0601 33.9348 33.8999 12.6007
0.4507 0.68 5500 0.4656 36.506 12.7992 34.0665 34.0409 11.4316
0.4445 0.74 6000 0.4628 37.0737 13.3324 34.416 34.3902 12.3211
0.4557 0.8 6500 0.4594 37.3349 13.1633 34.4709 34.4503 12.2522
0.4451 0.87 7000 0.4600 37.3384 13.5699 34.6726 34.6555 12.0494
0.4381 0.93 7500 0.4588 37.6164 13.7855 34.8467 34.8084 12.1347
0.4357 0.99 8000 0.4571 37.2047 13.4341 34.3383 34.3121 12.2670
0.3869 1.05 8500 0.4612 37.684 13.6922 34.9914 34.9721 11.3216
0.377 1.11 9000 0.4616 37.2615 13.2059 34.3375 34.3327 12.3221
0.3736 1.17 9500 0.4607 37.2109 13.1387 34.3923 34.3638 11.8274
0.3801 1.24 10000 0.4617 38.0033 13.7561 35.2434 35.2414 11.6079
0.3816 1.3 10500 0.4599 37.3453 13.622 34.6495 34.639 12.2101
0.377 1.36 11000 0.4619 37.2996 13.4583 34.3777 34.3525 12.3911
0.3745 1.42 11500 0.4604 37.5448 13.3841 34.5785 34.5532 12.2747
0.3785 1.48 12000 0.4568 38.0769 14.0089 35.0744 35.0605 12.3179
0.3675 1.54 12500 0.4587 37.6284 13.8277 34.7837 34.7618 11.8732
0.3731 1.61 13000 0.4554 38.433 14.1461 35.6757 35.6683 11.4294
0.3731 1.67 13500 0.4548 37.9065 13.7526 34.9091 34.8919 12.1241
0.371 1.73 14000 0.4542 38.4064 14.2136 35.4845 35.4671 12.1014
0.3615 1.79 14500 0.4551 38.0695 14.1042 35.162 35.1427 12.1135
0.3687 1.85 15000 0.4550 38.1978 14.1243 35.3107 35.2821 12.2255
0.3711 1.92 15500 0.4532 37.661 13.603 34.7601 34.7467 12.1632
0.3685 1.98 16000 0.4515 38.5727 14.5345 35.5855 35.5585 11.9162
0.3333 2.04 16500 0.4626 38.4657 14.4726 35.6431 35.6119 11.9506
0.3129 2.1 17000 0.4660 38.2002 14.0689 35.1851 35.1748 12.3313
0.3155 2.16 17500 0.4674 37.8919 13.91 34.9167 34.9154 12.4853
0.3134 2.22 18000 0.4644 38.1576 13.9371 35.0486 35.0252 11.9748
0.3167 2.29 18500 0.4653 37.8516 13.9029 34.7959 34.7847 12.5273
0.322 2.35 19000 0.4673 37.9883 14.0127 34.8667 34.841 12.4680
0.312 2.41 19500 0.4641 38.4611 14.238 35.4465 35.417 11.9315
0.3173 2.47 20000 0.4654 38.1477 13.9164 35.1148 35.0905 12.4845
0.3081 2.53 20500 0.4640 38.7153 14.3282 35.7048 35.6923 11.8932
0.3093 2.6 21000 0.4633 38.2932 14.0961 35.2736 35.2308 11.8932
0.3154 2.66 21500 0.4637 38.0708 13.7374 35.0722 35.055 12.1310
0.3096 2.72 22000 0.4630 38.3722 14.041 35.2847 35.2425 12.2591
0.3101 2.78 22500 0.4627 38.6372 14.2961 35.5118 35.4819 12.2836
0.309 2.84 23000 0.4620 38.3596 14.0396 35.3285 35.3 12.3281
0.312 2.9 23500 0.4623 38.4268 14.0768 35.4015 35.3656 12.2208
0.3135 2.97 24000 0.4624 38.4609 14.1268 35.4304 35.391 12.1355

Framework versions

  • Transformers 4.39.3
  • Pytorch 2.1.2
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
247
Safetensors
Model size
239M params
Tensor type
F32
·
Inference API
Inference API (serverless) does not yet support transformers models for this pipeline type.

Model tree for NourFakih/image-captioning-Vit-GPT2-Flickr8k

Finetuned
(10)
this model
Finetunes
2 models