ViT-GPT2-FlowerCaptioner

This model is a fine-tuned version of nlpconnect/vit-gpt2-image-captioning on the FlowerEvolver-dataset dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4930
  • Rouge1: 68.3498
  • Rouge2: 46.7534
  • Rougel: 62.3763
  • Rougelsum: 65.9575
  • Gen Len: 49.82

sample running code

with python

from transformers import pipeline

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
FlowerCaptioner = pipeline("image-to-text", model="cristianglezm/ViT-GPT2-FlowerCaptioner", device=device)
FlowerCaptioner(["flower1.png"])
# A flower with 12 petals in a smooth gradient of green and blue. 
# The center is green with black accents. The stem is long and green.

with javascript

import { pipeline } from '@xenova/transformers';

// Allocate a pipeline for image-to-text
let pipe = await pipeline('image-to-text', 'cristianglezm/ViT-GPT2-FlowerCaptioner-ONNX');

let out = await pipe('flower image url');
// A flower with 12 petals in a smooth gradient of green and blue. 
// The center is green with black accents. The stem is long and green.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 25

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
0.6986 1.0 100 0.5339 64.9813 42.4686 58.2586 63.3933 47.25
0.3408 2.0 200 0.3263 67.5461 46.5219 62.7962 65.6509 47.39
0.2797 3.0 300 0.2829 65.0704 42.0682 58.4268 63.2368 56.8
0.2584 4.0 400 0.2588 65.5074 45.227 60.2469 63.4253 52.25
0.2589 5.0 500 0.2607 66.7346 45.8264 61.7373 64.8857 50.64
0.2179 6.0 600 0.2697 63.8334 42.997 58.1585 61.7704 52.43
0.1662 7.0 700 0.2631 68.6188 48.3329 63.9474 66.6006 46.94
0.161 8.0 800 0.2749 69.0046 48.1421 63.7844 66.8317 49.74
0.1207 9.0 900 0.3117 70.0357 48.9002 64.416 67.7582 48.66
0.0909 10.0 1000 0.3408 65.9578 45.2324 60.2838 63.7493 46.92
0.0749 11.0 1100 0.3516 67.4244 46.1985 61.6408 65.5371 46.61
0.0665 12.0 1200 0.3730 68.6911 47.7089 63.0381 66.6956 47.89
0.0522 13.0 1300 0.3891 67.2365 45.4165 61.4063 64.857 48.91
0.0355 14.0 1400 0.4128 69.1494 47.9278 63.3334 66.5969 50.55
0.0309 15.0 1500 0.4221 66.2447 44.937 60.1403 63.8541 50.71
0.0265 16.0 1600 0.4343 67.8178 46.7084 61.8173 65.4375 50.85
0.0158 17.0 1700 0.4577 67.9846 45.9562 61.6353 65.7207 50.81
0.0166 18.0 1800 0.4731 69.0971 47.7001 62.856 66.7796 50.01
0.0121 19.0 1900 0.4657 68.1397 46.4258 62.2696 65.9332 49.15
0.0095 20.0 2000 0.4793 68.6497 47.9446 63.0466 66.5409 50.96
0.0086 21.0 2100 0.4780 68.4363 46.7296 62.359 66.2626 50.02
0.0068 22.0 2200 0.4863 67.5415 46.0821 61.57 65.4613 49.5
0.0061 23.0 2300 0.4892 68.1283 46.5802 62.0832 66.0203 50.21
0.006 24.0 2400 0.4912 68.1723 46.3239 62.2007 65.6725 49.89
0.0057 25.0 2500 0.4930 68.3498 46.7534 62.3763 65.9575 49.82

Framework versions

  • Transformers 4.43.4
  • Pytorch 2.4.1+cu124
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
10
Safetensors
Model size
239M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for cristianglezm/ViT-GPT2-FlowerCaptioner

Finetuned
(10)
this model

Dataset used to train cristianglezm/ViT-GPT2-FlowerCaptioner