@automatedstockminingorg on Hugging Face: "hi everyone, i have just uploaded my first fine tuned model, but serverless…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

automatedstockminingorg

posted an update Nov 2

Post

1767

hi everyone, i have just uploaded my first fine tuned model, but serverless inference client is'nt available, its built with transformer architecture and is just a fine tuned llama 8b instruct. does anyone know how to make serverless inference available on a model?

John6666

Nov 2

•

edited Nov 2

The Serverless Inference API was significantly degraded a few months ago, making it almost unusable unless it is labeled Warm.
The conditions under which it will be Warm are unknown, but it is safe to say that it is impossible to aim for.
If you create a Spaces with Gradio, it may work, so you can try that.
https://www.gradio.app/guides/using-hugging-face-integrations

automatedstockminingorg

Nov 2

thanks

ZeroXClem

Nov 4

You can use Modal Labs to run inference. GPU's take ~ 30 seconds to provision. Here's a quickstart on the matter: https://modal.com/docs/examples/vllm_inference

ZeroXClem

Nov 4

This comment has been hidden

ZeroXClem

Nov 4

This comment has been hidden

In this post