Getting different results for the same examples provided in sample
I tried the code implementation using sentence transformers, using the exact same queries and docs inputs, but my results are very different. I am running this on cpu (so removed .cuda
and removed trust_remote_code=True
during model download because it expects CUDA paths)
tensor([[0.3364, 0.2758],
[0.2444, 0.2929]])
Hello!
The trust_remote_code=True
is still required to run this code: https://huggingface.co./dunzhang/stella_en_1.5B_v5/blob/main/modeling_qwen.py instead of the default code for models with the Qwen architecture.
You should get equivalent results when you re-enable that option.
- Tom Aarsen
thanks for the response. So what I understand is that the sample example would need GPUs to get the desired result. Is that correct?
No, my apologies. The snippet from the model card without .cuda()
should give the desired results on CPU.
Edit: I just realised that perhaps the custom modeling code does not work on CPU, due to the flash-attn
requirements.
- Tom Aarsen
yes flash-attn
is not supported for CPUs which is a requirement even for the model card sample.