Image-Text-to-Text
Transformers
Safetensors
lora
Inference Endpoints
Edit model card

pixtral_aerial_VQA_adapter

Model Details

  • Type: LoRA Adapter
  • Total Parameters: 6,225,920
  • Memory Usage: 23.75 MB
  • Precisions: torch.float32
  • Layer Types:
    • lora_A: 40
    • lora_B: 40

Intended Use

  • Primary intended uses: Processing aerial footage of construction sites for structural and construction surveying.
  • Can also be applied to any detailed VQA use cases with aerial footage.

Training Data

  • Dataset:
    1. FloodNet Track 2 dataset
    2. Subset of FGVC Aircraft dataset
    3. Custom dataset of 10 image-caption pairs created using Pixtral

Training Procedure

  • Training method: LoRA (Low-Rank Adaptation)
  • Base model: Ertugrul/Pixtral-12B-Captioner-Relaxed
  • Training hardware: Nebius-hosted NVIDIA H100 machine

Citation

@misc{rahnemoonfar2020floodnet,
  title={FloodNet: A High Resolution Aerial Imagery Dataset for Post Flood Scene Understanding},
  author={Maryam Rahnemoonfar and Tashnim Chowdhury and Argho Sarkar and Debvrat Varshney and Masoud Yari and Robin Murphy},
  year={2020},
  eprint={2012.02951},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  doi={10.48550/arXiv.2012.02951}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference API
Inference API (serverless) does not yet support transformers models for this pipeline type.

Model tree for takara-ai/pixtral_aerial_VQA_adapter

Adapter
(1)
this model

Datasets used to train takara-ai/pixtral_aerial_VQA_adapter