@Xenova on Hugging Face: "I can't believe this... Phi-3.5-mini (3.8B) running in-browser at ~90…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

Xenova

posted an update Aug 23

Post

13911

I can't believe this... Phi-3.5-mini (3.8B) running in-browser at ~90 tokens/second on WebGPU w/ Transformers.js and ONNX Runtime Web! 🤯 Since everything runs 100% locally, no messages are sent to a server — a huge win for privacy!
- 🤗 Demo: webml-community/phi-3.5-webgpu
- 🧑‍💻 Source code: https://github.com/huggingface/transformers.js-examples/tree/main/phi-3.5-webgpu

ZeroWw

Aug 23

why in the world should be cool to run something in a browser when it can run locally using llama.cpp ??

Ke09876

Aug 25

Why in the world would you bother installing llama.cpp when you can just open a webpage?

qnixsynapse

Aug 24

Depends upon the GPU hardware tbh. Not everywhere you can get 90 tokens/sec. :)

ceoofcapybaras

Aug 24

As a frontend dev, LLMs were not meant for the browsers. You have to download the weights every time you reload the page. It's impressive that they do run well in the browser, but I don't see any practical use cases.

Ke09876

Aug 25

You don't donwload the weights every time, they are usually stored in the OPFS or the IndexedDB.

gianpaj

Nov 4

i'm getting 11.24tokens/second on an MacBook M1 Pro

In this post