Model Details

Model Description

Training Details

https://github.com/mit1280/fined-tuning/blob/main/Kosmos_2_fine_tune_PokemonCards_trl.ipynb

Inference Details

https://github.com/mit1280/fined-tuning/blob/main/kosmos2_fine_tuned_inference.ipynb

How to Use

from transformers import AutoProcessor, Kosmos2ForConditionalGeneration
import torch
from io import BytesIO
import requests
from PIL import Image

processor = AutoProcessor.from_pretrained("microsoft/kosmos-2-patch14-224")
my_model = Kosmos2ForConditionalGeneration.from_pretrained("Mit1208/Kosmos-2-PokemonCards-trl-merged", device_map="auto",low_cpu_mem_usage=True)

# load image
image_url = "https://images.pokemontcg.io/sm9/24_hires.png"
response = requests.get(image_url)
# Read the image from the response content
image = Image.open(BytesIO(response.content))

prompt = "Pokemon name is"

inputs = processor(text=prompt, images=image, return_tensors="pt").to("cuda:0")
with torch.no_grad():
    # autoregressively generate completion
    generated_ids = my_model.generate(**inputs, max_new_tokens=30,)
# convert generated token IDs back to strings
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(generated_text.split("</image>")[-1].split(" and")[0] + ".")

'''
Output: Pokemon name is Wartortle.
'''

Limitation

This model was fine-tuned using free colab version so only used 300 samples in training for 85 epochs. Model is hallucinating very frequently so need to do post-processing. Another approach to handle this issue is update training data - use conversation data and/or update tokenizer padding token to tokenizer eos token.

Downloads last month
15
Safetensors
Model size
1.66B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Mit1208/Kosmos-2-PokemonCards-trl-merged