kiddobellamy commited on
Commit
7bb259c
1 Parent(s): a5cdac2

Create README.md

Browse files

# Meta Llama 3.2 90B Vision-Instruct Model

This repository contains the **Meta Llama 3.2 90B Vision-Instruct** model, a pretrained vision-language model developed by Meta AI. This model is designed to handle vision-related tasks such as image captioning, image classification, and more.

## Model Description

The Llama 3.2 90B Vision-Instruct model combines the LLaMA architecture with multimodal capabilities, enabling it to process and understand both text and images.

### Key Features:
- **Architecture**: LLaMA-based model for vision and language tasks.
- **Parameter Size**: 90B parameters.
- **Pretrained on**: A wide range of image and text data to support multimodal tasks.

## How to Use

You can use this model for tasks like generating captions for images, performing image classification, and more. Below is an example of how to load and use this model with the Hugging Face `transformers` library.

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the model and tokenizer
model = AutoModelForCausalLM.from_pretrained("kiddobellamy/Llama_Vision")
tokenizer = AutoTokenizer.from_pretrained("kiddobellamy/Llama_Vision")

# Example usage
inputs = tokenizer("Describe the following image: <image>", return_tensors="pt")
outputs = model.generate(inputs.input_ids)

# Output the generated description
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Files changed (1) hide show
  1. README.md +15 -0
README.md ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - es
5
+ - en
6
+ - fr
7
+ base_model:
8
+ - meta-llama/Llama-3.2-90B-Vision-Instruct
9
+ pipeline_tag: video-text-to-text
10
+ library_name: transformers
11
+ tags:
12
+ - llama
13
+ - multimodal
14
+ - meta-ai
15
+ ---