Fast inference engine

by SinanAkkoyun - opened Jul 20

Jul 20

Hello,
I understand why you can't use Llama, but please work on a vLLM PR when dropping a new architecture like DeepSeek does

Thank you

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment