Create README.md
Browse files# Meta Llama 3.2 90B Vision-Instruct Model
This repository contains the **Meta Llama 3.2 90B Vision-Instruct** model, a pretrained vision-language model developed by Meta AI. This model is designed to handle vision-related tasks such as image captioning, image classification, and more.
## Model Description
The Llama 3.2 90B Vision-Instruct model combines the LLaMA architecture with multimodal capabilities, enabling it to process and understand both text and images.
### Key Features:
- **Architecture**: LLaMA-based model for vision and language tasks.
- **Parameter Size**: 90B parameters.
- **Pretrained on**: A wide range of image and text data to support multimodal tasks.
## How to Use
You can use this model for tasks like generating captions for images, performing image classification, and more. Below is an example of how to load and use this model with the Hugging Face `transformers` library.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained("kiddobellamy/Llama_Vision")
tokenizer = AutoTokenizer.from_pretrained("kiddobellamy/Llama_Vision")
# Example usage
inputs = tokenizer("Describe the following image: <image>", return_tensors="pt")
outputs = model.generate(inputs.input_ids)
# Output the generated description
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
language:
|
4 |
+
- es
|
5 |
+
- en
|
6 |
+
- fr
|
7 |
+
base_model:
|
8 |
+
- meta-llama/Llama-3.2-90B-Vision-Instruct
|
9 |
+
pipeline_tag: video-text-to-text
|
10 |
+
library_name: transformers
|
11 |
+
tags:
|
12 |
+
- llama
|
13 |
+
- multimodal
|
14 |
+
- meta-ai
|
15 |
+
---
|