Model Card for Mistral7B-v0.1-coco-caption-de

This model is a fine-tuned model of the Mistral7B-v0.1 completion model and meant to produce german COCO like captions.

The coco-karpathy-opus-de dataset was used to tune the model for german image caption generation.

Model Details

Prompt format

The completion model is trained with the prompt prefix Bildbeschreibung:

Examples:

>>> Bildbeschreibung: 
2 Hunde sitzen auf einer Bank neben einer Pflanze

>>> Bildbeschreibung: Wasser
fall und Felsen vor dem Gebäude mit Blick auf den Fluss.

>>> Bildbeschreibung: Ein grünes Auto mit roten 
 Reflektoren parkte auf dem Parkplatz.

Model Description

Uses

The model is meant to be used in conjunction with a BLIP2 Q-Former to enable image captioning, visual question answering (VQA) and chat-like conversations.

Training Details

The preliminary training script uses PEFT and DeepSpeed to execute the traininng.

Training Data

Training Procedure

The model was trained using PEFT 4Bit Q-LoRA with the following parameters:

  • rank: 256
  • alpha: 16
  • steps: 8500
  • bf16: True
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.03
  • gradient accumulation steps: 2
  • batch size: 4
  • Input sequence length: 512
  • Learning Rate: 2.0e-5

Postprocessing

The merged model was saved using PeftModel API.

Framework versions

  • PEFT 0.8.2
Downloads last month
23
Safetensors
Model size
7.24B params
Tensor type
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Jotschi/Mistral-7B-v0.1-coco-caption-de

Finetuned
(801)
this model
Quantizations
1 model