Getting different results for the same examples provided in sample

#17

by sramakintel - opened Jul 28

Jul 28

I tried the code implementation using sentence transformers, using the exact same queries and docs inputs, but my results are very different. I am running this on cpu (so removed .cudaand removed trust_remote_code=True during model download because it expects CUDA paths)

tensor([[0.3364, 0.2758],
[0.2444, 0.2929]])

tomaarsen

Jul 28

Hello!

The trust_remote_code=True is still required to run this code: https://huggingface.co./dunzhang/stella_en_1.5B_v5/blob/main/modeling_qwen.py instead of the default code for models with the Qwen architecture.
You should get equivalent results when you re-enable that option.

Tom Aarsen

sramakintel

Jul 28

thanks for the response. So what I understand is that the sample example would need GPUs to get the desired result. Is that correct?

tomaarsen

Jul 28

•

edited Jul 28

No, my apologies. The snippet from the model card without .cuda() should give the desired results on CPU.

Edit: I just realised that perhaps the custom modeling code does not work on CPU, due to the flash-attn requirements.

Tom Aarsen

sramakintel

Jul 28

yes flash-attn is not supported for CPUs which is a requirement even for the model card sample.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment