Seems like the user prompt is ignored
Thanks for this contributed model HuggingFaceM4/idefics2-8b
, actually I've trouble with it.
It seems to me that the user prompt is ignored and that the model always answers to a question like "Describe the image".
I ran into this problem while computing a VQAv2 metric.
To convince me, I ran the provided decoding example, where the users says "What’s the difference between these two images?".
Here is the script output:
User: What’s the difference between these two images?<image><image><end_of_utterance>
Assistant:
Generated text: User: What’s the difference between these two images?
Assistant: A dog and a cat are sleeping on a couch.
I would indeed have expected instead something closer to the GT answer given in the next (training) example, so "The difference is that one image is about dogs and the other one about cats."
Any idea??
Thanks
JL
Actually, definitely a problem. Changing the prompt returns the same.
User: Is there a cow in the image? Yes or no?<image><image><end_of_utterance>
Assistant:
Generated text: User: Is there a cow in the image? Yes or no?
Assistant: The dog and cat are sleeping on the couch.
In case it helps, my settings are:
- `transformers` version: 4.47.0
- Platform: Linux-4.18.0-553.27.1.el8_10.x86_64-x86_64-with-glibc2.28
- Python version: 3.12.8
- Huggingface_hub version: 0.27.0
- Safetensors version: 0.4.5
- Accelerate version: not installed
- Accelerate config: not found
- PyTorch version (GPU?): 2.5.1+cu124 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: single process
- Using GPU in script?: 1 GPU
- GPU type: NVIDIA A100-SXM4-80GB
Can you try with transformers==4.40.0
to see if you have the same problem? It should work with this version