How to bypass flash attention 2 requirement for Apple Silicon?
#63
by
MC-QQ
- opened
I got one M4 Mac mini and try to run this model.
Got the following error
Library/Python/3.9/lib/python/site-packages/transformers/modeling_utils.py", line 1659, in _check_and_enable_flash_attn_2
raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")
ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co./docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.
Tried the approach in https://huggingface.co./qnguyen3/nanoLLaVA-1.5/discussions/4 but didn't work.
Any suggestions? Thanks a ton!
Hopefully it's available in mlx-vlm
at some point...
Exact same issue on my side. There are some articles showing success run on Mac, but it doesn't work for me:
https://www.danielcorin.com/til/deekseek/janus-pro-local/
Is it caused by module inconsistency?