<|begin_of_text|> is added twice by the preprocessor
#44
by
kz919
- opened
import requests
import torch
from PIL import Image
from transformers import MllamaForConditionalGeneration, AutoProcessor
model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"
model = MllamaForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_id)
url = "https://huggingface.co./datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
image = Image.open(requests.get(url, stream=True).raw)
messages = [
{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": "If I had to write a haiku for this one, it would be: "}
]}
]
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(image, input_text, return_tensors="pt").to(model.device)
print(inputs.input_ids)
print(processor.tokenizer.decode(inputs.input_ids[0, :6]))
output = model.generate(**inputs, max_new_tokens=30)
print(processor.decode(output[0]))
tensor([[128000, 128000, 128006, 882, 128007, 271, 128256, 2746, 358,
1047, 311, 3350, 264, 6520, 39342, 369, 420, 832,
11, 433, 1053, 387, 25, 220, 128009, 128006, 78191,
128007, 271]], device='cuda:0')
'<|begin_of_text|><|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n'
128000 is added twice when running the given example, does it impact the quality of the model?
It does not seem to have much impact to the model quality, but you can remove the extra bos_token
by adding add_special_tokens=False
to the processor since apply_chat_template
has already taken care of all necessary special tokens.
inputs = processor(image, input_text, return_tensors="pt", add_special_tokens=False).to(model.device)