mac mps/cpu support
not working on mac m3, not even on cpu or mps
i digged into this, supposed to be a issue with the vllm library? any chance of change in dependency away from this in future?
Hi
@MLLife
, for local execution on Mac with metal, you can run directly using transformers
. Additionally, we've added support in llama.cpp and other runtimes based on that engine:
- Ollama
- The Ollama model currently requires the vf0.5.13rc1 preview release which will move to a full release soon
- By default, the Ollama model uses
Q4_K_M
quantization for the LLM portion, but you can also see the other precisions in the full list of tags
- LM Studio
- A collection of different precisions is available here
- llama-cli (llama.cpp)
- Official GGUF conversions available here: https://huggingface.co./ibm-research/granite-vision-3.2-2b-GGUF
Hi @MLLife ! Can you please elaborate on what errors you are seeing with vLLM and how you're installing it?
vLLM does not have support for a metal backend yet, but it should work on CPU (although I don't have access to a Mac wit an m3 chip to check). vLLM support for Apple silicon is experimental though, so you may need to build it from source if you're wanting to run it there - are you building vLLM from source (i.e., similar to this)?
@abrooks9944 , thanks for the pointer, now i am getting this issue, https://github.com/vllm-project/vllm/issues/13593