---
license: apache-2.0
tags:
- merge
base_model:
- CohereForAI/aya-23-8B
- google/siglip-base-patch16-256-multilingual
datasets:
- maya-multimodal/pretrain
- MBZUAI/palo_multilingual_dataset
language:
- en
- hi
- fr
- ru
- zh
- ar
- ja
- es
pipeline_tag: image-text-to-text
library_name: transformers
---

# Maya: A Multilingual Vision Language Model

Maya is an instruction-finetuned multilingual multimodal model that expands multimodal capabilities to eight languages with an emphasis on data quality and cultural sensitivity. Built on the LLaVA framework, Maya includes a newly created pre-training dataset designed to support multilingual and culturally aware VLM development.

## Model Description

- **Developed by:** Cohere For AI Community
- **Model type:** Multimodal Vision-Language Model
- **Language(s):** English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi
- **License:** Apache 2.0
- **Related Paper:** [Maya: An Instruction Finetuned Multilingual Multimodal Model](https://arxiv.org/abs/2412.07112)

## Model Details

Maya leverages the lightweight architecture to provide a compact yet powerful multimodal experience, with several key features:

- Built on LLaVA framework using Aya-23 8B model
- Uses SigLIP for vision encoding with multilingual adaptability
- Supports 8 languages with strong cultural understanding
- Trained on toxicity-filtered dataset for safer deployment

### Model Architecture

- **Base Model:** Aya-23 8B
- **Vision Encoder:** SigLIP (multilingual)
- **Training Data:** 558,000 images with multilingual annotations
- **Context Length:** 8K tokens
- **Parameters:** 8 billion

## Intended Uses

Maya is designed for:

- Multilingual visual question answering
- Cross-cultural image understanding
- Image captioning in multiple languages
- Visual reasoning tasks
- Document understanding

## Usage

```bash
# Clone the Github repository
git clone https://github.com/nahidalam/maya

# Change the working directory
cd maya
```

```python
# Run the following code
from llava.eval.talk2maya import run_vqa_model

# Define inputs
question = "Try identify what plane this is, based on the design."
image_path = "./llava/eval/claude_plane_test_2.jpeg" 

# Run model
answer = run_vqa_model(
    question=question,
    image_file=image_path
)
```

## Limitations

- Limited to 8 languages currently
- Requires high-quality images for optimal performance
- May not capture nuanced cultural contexts in all cases
- Performance varies across languages and tasks

## Bias, Risks, and Limitations

Maya has been developed with attention to bias mitigation and safety:

- Dataset filtered for toxic content
- Cultural sensitivity evaluations performed
- Regular bias assessments conducted
- Limited to high-quality, vetted training data

However, users should be aware that:
- Model may still exhibit biases present in training data
- Performance may vary across different cultural contexts
- Not suitable for critical decision-making applications

## Training Details

Maya was trained using:
- 558,000 curated images
- Multilingual annotations in 8 languages
- Toxicity-filtered dataset
- 8xH100 GPUs with 80GB DRAM
- Batch size of 32 (per device)
- Learning rate of 1e-3 with cosine scheduler

## Citation

```bibtex
@misc{alam2024mayainstructionfinetunedmultilingual,
      title={Maya: An Instruction Finetuned Multilingual Multimodal Model}, 
      author={Nahid Alam and Karthik Reddy Kanjula and Surya Guthikonda and Timothy Chung and Bala Krishna S Vegesna and Abhipsha Das and Anthony Susevski and Ryan Sze-Yin Chan and S M Iftekhar Uddin and Shayekh Bin Islam and Roshan Santhosh and Snegha A and Drishti Sharma and Chen Liu and Isha Chaturvedi and Genta Indra Winata and Ashvanth. S and Snehanshu Mukherjee and Alham Fikri Aji},
      year={2024},
      eprint={2412.07112},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.07112}, 
}
```

## Contact

For questions or feedback about Maya, please:
- Open an issue on our [GitHub repository](https://github.com/nahidalam/maya)
- Contact the maintainers at: nahid.m.alam@gmail.com, maya.c4ai@gmail.com