Fast inference engine
#2
by
SinanAkkoyun
- opened
Hello,
I understand why you can't use Llama, but please work on a vLLM PR when dropping a new architecture like DeepSeek does
Thank you
Hello,
I understand why you can't use Llama, but please work on a vLLM PR when dropping a new architecture like DeepSeek does
Thank you