Running into "Flash attention implementation does not support kwargs: prompt_length" when using the exact example from the Readme

#49

by HotSauce7 - opened 30 days ago

30 days ago

Hi folks,
thanks for the amazing work. Unfortunately when I use the exact example from your Readme (Model Card), which is:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("jinaai/jina-embeddings-v3", trust_remote_code=True)

task = "retrieval.query"
embeddings = model.encode(
    ["What is the weather like in Berlin today?"],
    task=task,
    prompt_name=task,
)

with both sentence-transformers==3.2.0 and sentence-transformers==3.1.0 I am getting the warning Flash attention implementation does not support kwargs: prompt_length. If I remove prompt_name=task this does not occur, but creates totally different embeddings.
I have flash-attn==2.6.2 installed.
Any ideas what I am missing?

Mahendranath

26 days ago

Have you found the solution for it?

ivan-nenchev

25 days ago

I am experiencing the same issue. Any advice would be greatly appreciated.

q-zhang

23 days ago

•

edited 23 days ago

# https://huggingface.co./jinaai/xlm-roberta-flash-implementation/blob/main/modeling_xlm_roberta.py
# line 671
        adapter_mask = kwargs.pop("adapter_mask", None)
        if kwargs:
            for key, value in kwargs.items():
                if value is not None:
                    logger.warning(
                        "Flash attention implementation does not support kwargs: %s",
                        key,
                    )

The file could be find in path ~/.cache/huggingface/modules/transformers_modules/jinaai/xlm-roberta-flash-implementation/12700ba4972d9e900313a85ae855f5a76fb9500e

maybe we could decrease the logger level to debug, to rm the warning.

jupyterjazz

Jina AI org 23 days ago

Hi, thanks for reporting the issue. This warning didn't really impact the model outputs but I still made a modification to stop passing prompt_length to the model, so you shouldn't see the warning anymore.

As for the argument itself, it seems some models need prompt_length to not include the prompt during pooling, but jina-embeddings-v3 doesn't do this, so it's not relevant for us.

HotSauce7

22 days ago

@jupyterjazz Thank you for the quick response. It was rather an issue of confusion if the implementation with prompt_name=task was correct. Clarification and implementation of the solution makes perfect sense. Closing the issue!

HotSauce7 changed discussion status to closed 22 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment