@mitkox on Hugging Face: "llama.cpp is 26.8% faster than ollama. I have upgraded both, and using the…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

mitkox

posted an update Jan 24

Post

2458

llama.cpp is 26.8% faster than ollama.
I have upgraded both, and using the same settings, I am running the same DeepSeek R1 Distill 1.5B on the same hardware. It's an Apples to Apples comparison.

Total duration:
llama.cpp 6.85 sec <- 26.8% faster
ollama 8.69 sec

Breakdown by phase:
Model loading
llama.cpp 241 ms <- 2x faster
ollama 553 ms

Prompt processing
llama.cpp 416.04 tokens/s with an eval time 45.67 ms <- 10x faster
ollama 42.17 tokens/s with an eval time of 498 ms

Token generation
llama.cpp 137.79 tokens/s with an eval time 6.62 sec <- 13% faster
ollama 122.07 tokens/s with an eval time 7.64 sec

llama.cpp is LLM inference in C/C++; ollama adds abstraction layers and marketing.

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.

benjamin-paine

Jan 24

Thanks for doing this! I've been all-in on llama.cpp for awhile now but I would be lying if I said I didn't wonder if I was missing out on anything with other engines.

Delta-Vector

Jan 24

Ever tried one of the other forks of LCPP like LM studio & Kobold CPP and done a speed test against Llama.cpp? It'd be interesting to see the speed difference of those two.

AtAndDev

Jan 25

i believe sglang would be even faster but not sure if it supports non-nvidia devices

Tech-Meld

Jan 25

how did you get Deepseek R1 Qwen 1.5B running with llama.cpp? i made a small app with llama.cpp bindings for python but when trying to load this new model i get an error saying this model is not supported, please help!

John6666

Jan 26

I don't think the pip version of llama-cpp-python supports it yet. I think it's only part of the github version. The build for CUDA tends to fail...
https://github.com/abetlen/llama-cpp-python/issues/1900

vkrnitd

Jan 28

This comment has been hidden

In this post