Pavel Iakubovskii

qubvel-hf

AI & ML interests

Computer Vision models

Recent Activity

Organizations

Hugging Face's profile picture PyTorch Image Models's profile picture Peking University's profile picture Hugging Face Internal Testing Organization's profile picture Huggingface Projects's profile picture Hugging Face OSS Metrics's profile picture Hugging Face for Computer Vision's profile picture kotol's profile picture yorg's profile picture CVPR2024's profile picture Hugging Face Discord Community's profile picture nltpt's profile picture s0409's profile picture Segmentation Models Pytorch's profile picture smp-test's profile picture University of Sydney's profile picture s0225's profile picture ETH Zurich - Computer Vision and Geometry Lab's profile picture

qubvel-hf's activity

upvoted an article 6 days ago
view article
Article

SmolVLM2: Bringing Video Understanding to Every Device

β€’ 202
reacted to clem's post with πŸ”₯ 6 days ago
New activity in facebook/sam-vit-large 8 days ago

Update code snippet

#6 opened 8 days ago by
qubvel-hf
New activity in facebook/sam-vit-huge 8 days ago

Update code snippet

#11 opened 8 days ago by
qubvel-hf
New activity in facebook/sam-vit-base 8 days ago

Update code snippet

#8 opened 8 days ago by
qubvel-hf
upvoted an article 10 days ago
view reply

btw, also observed "." and capitalized template influences the confidence quite a bit

view reply

Not sure what's up as I'm not familiar with this codebase (and no time to dig in), but for siglip what you're supposed to do is do sigmoid(zimg @ ztxt * temperature + bias)

from what you describe, I would bet the bias and/or temperature are missing?
The ground-truth reference code is https://colab.research.google.com/github/google-research/big_vision/blob/main/big_vision/configs/proj/image_text/SigLIP2_demo.ipynb

Hey @giffmana , temperature and bias are applied under the hood, see

Siglip
https://github.com/huggingface/transformers/blob/17792556b21b4da0dbb9e4b59b39fb34aae4047c/src/transformers/models/siglip/modeling_siglip.py#L1411-L1417

Siglip2
https://github.com/huggingface/transformers/blob/17792556b21b4da0dbb9e4b59b39fb34aae4047c/src/transformers/models/siglip2/modeling_siglip2.py#L1459-L1465