<|begin_of_text|> is added twice by the preprocessor

#44
by kz919 - opened
import requests
import torch
from PIL import Image
from transformers import MllamaForConditionalGeneration, AutoProcessor

model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"

model = MllamaForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_id)

url = "https://huggingface.co./datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg"
image = Image.open(requests.get(url, stream=True).raw)

messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "If I had to write a haiku for this one, it would be: "}
    ]}
]
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
inputs = processor(image, input_text, return_tensors="pt").to(model.device)


print(inputs.input_ids)
print(processor.tokenizer.decode(inputs.input_ids[0, :6]))
output = model.generate(**inputs, max_new_tokens=30)
print(processor.decode(output[0]))
tensor([[128000, 128000, 128006,    882, 128007,    271, 128256,   2746,    358,
           1047,    311,   3350,    264,   6520,  39342,    369,    420,    832,
             11,    433,   1053,    387,     25,    220, 128009, 128006,  78191,
         128007,    271]], device='cuda:0')
'<|begin_of_text|><|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n'

128000 is added twice when running the given example, does it impact the quality of the model?

It does not seem to have much impact to the model quality, but you can remove the extra bos_token by adding add_special_tokens=False to the processor since apply_chat_template has already taken care of all necessary special tokens.

inputs = processor(image, input_text, return_tensors="pt", add_special_tokens=False).to(model.device)

Sign up or log in to comment