Running into "Flash attention implementation does not support kwargs: prompt_length" when using the exact example from the Readme
Hi folks,
thanks for the amazing work. Unfortunately when I use the exact example from your Readme (Model Card), which is:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("jinaai/jina-embeddings-v3", trust_remote_code=True)
task = "retrieval.query"
embeddings = model.encode(
["What is the weather like in Berlin today?"],
task=task,
prompt_name=task,
)
with both sentence-transformers==3.2.0
and sentence-transformers==3.1.0
I am getting the warning Flash attention implementation does not support kwargs: prompt_length
. If I remove prompt_name=task
this does not occur, but creates totally different embeddings.
I have flash-attn==2.6.2
installed.
Any ideas what I am missing?
Have you found the solution for it?
I am experiencing the same issue. Any advice would be greatly appreciated.
# https://huggingface.co./jinaai/xlm-roberta-flash-implementation/blob/main/modeling_xlm_roberta.py
# line 671
adapter_mask = kwargs.pop("adapter_mask", None)
if kwargs:
for key, value in kwargs.items():
if value is not None:
logger.warning(
"Flash attention implementation does not support kwargs: %s",
key,
)
The file could be find in path ~/.cache/huggingface/modules/transformers_modules/jinaai/xlm-roberta-flash-implementation/12700ba4972d9e900313a85ae855f5a76fb9500e
maybe we could decrease the logger level to debug, to rm the warning.
Hi, thanks for reporting the issue. This warning didn't really impact the model outputs but I still made a modification to stop passing prompt_length
to the model, so you shouldn't see the warning anymore.
As for the argument itself, it seems some models need prompt_length
to not include the prompt during pooling, but jina-embeddings-v3
doesn't do this, so it's not relevant for us.
@jupyterjazz
Thank you for the quick response. It was rather an issue of confusion if the implementation with prompt_name=task
was correct. Clarification and implementation of the solution makes perfect sense. Closing the issue!