Hugging Face org Sep 23, 2024

•

edited Sep 23, 2024

Inference Playground

This discussion is dedicated to providing feedback on the Inference Playground and Serverless Inference API.

About the Inference Playground:

The Inference Playground is a user interface designed to simplify testing our serverless inference API with chat models. It lists available models for you to try, allowing you to experiment with each model's settings, test available models via a UI, and copy code snippets.

To view all available settings, refer to the Serverless Inference for Chat Completion documentation.
Browse available chat models here.

If you need more usage, you can subscribe to PRO.

User Tier	Rate Limit
Unregistered Users	1 request per hour
Signed-up Users	50 requests per hour
PRO and Enterprise Users	500 requests per hour

Upcoming Features:

Continuous UI improvements
A dedicated UI for function calling
Support for vision language models
A feature to easily compare models

victor pinned discussion Sep 23, 2024

maharshpatelx

Sep 26, 2024

Add place to change API keys in Playground.

victor

Hugging Face org Sep 26, 2024

•

edited Sep 26, 2024

Add place to change API keys in Playground.

Yes I'll try to add this today. edit: I added it.

not-lain

Sep 27, 2024

support tool use🛠️

maharshpatelx

Sep 27, 2024

How to use this 🤗 InferenceClient with "Langchain" or "Llama-Index " ?

victor

Hugging Face org Sep 30, 2024

How to use this 🤗 InferenceClient with "Langchain" or "Llama-Index " ?

What do you mean? This is just a UI for easier testing and getting the code to do Inference on HF models.

Smorty100

Oct 9, 2024

Being able to use text completion like in Open Web UI would be great.

Testing function calling output would also be very appreciated. Actually calling the functions doesn't make any sense in this case, but generating the json for it would be very useful.

I am aware that this probably takes some work to accomplish, as the templates need to be evaluated to implement this, but it would be great, even if just for popular models (like the recent llama 3b)

victor

Hugging Face org Oct 9, 2024

•

edited Oct 9, 2024

Being able to use text completion like in Open Web UI would be great.

What do you mean? I think this is HuggingChat no?

Testing function calling output would also be very appreciated. Actually calling the functions doesn't make any sense in this case, but generating the json for it would be very useful.

Yes, we'll add that.

cfahlgren1

Hugging Face org Oct 10, 2024

This is awesome! one small nit, is on an iPhone with compare mode it doesn't show both well.

maybe a carousel type component (swipe) to show the different models could work good there.

victor

Hugging Face org Oct 15, 2024

It should be a bit better on mobile now @cfahlgren1

Smorty100

Oct 21, 2024

•

edited Oct 21, 2024

@victor Maybe I am missing something, but as far as I know, HuggingChat does not have a text-completion feature.
What I am refering to is a feature, where you provide some text, and the model completes it, like base models tend to do. Like this:

User input

The following artilcle will discuss the differences between lemon and carrot cake:
# Lemon cake vs Carrot cake!
**Lemon cake**
A delicious cake made of flour, some sugar and some other stuff too
**Carrot cake**

AI output

Another cake also made from flour, but with a carroty twist!
[...]

In open webui, one can simple use any text model, and use it to make the given input text longer.
I hope this clears up what I mean

EDIT: Got a question, does this inference playgound "subtract" the amount of compute we can spend on the spaces, or is that on another seemingly endless supply of compute, like HuggingChat is? (As far as I can tell, there is no limit to how many generations one can do in HuggingChat. I use it daily and haven't encountered any limit yet)

victor

Hugging Face org Oct 22, 2024

Hi @Smorty100 ,

The playground is only available for instruct models at the moment (I don't know if we will add support for base models).
Yes, when you use the playground you're calling the endpoint with your HF tokens, so it's subtracting from your quotas. We'll be clarifying the limits soon.

Smorty100

Oct 24, 2024

Hi @victor ,

Instruction tuned models can also complete text, just like base models. Just tested this with the granite3-moe-1B-instruct model with ollama (in Godot Game engine)

Here the test:

The model is prompted completely without any template, it simply continues what has already been said in a somewhat logical way.

The model I picked here is not the best, but I used it for the speed for the demo.

PSM24

Oct 24, 2024

Do you have any plans to add pay-as-you-go pricing per token?

And, would it be possible to support Qwen2.5-7B or 14B?

Smorty100

Nov 4, 2024

Currently, when accessing the playground from HuggingChat, it sometimes give an Error 500 code. This happens when a model is not available on the playground, but has the button on HuggingChat.

Here an example link which links from HuggingChat Llama 3.2 vision right to the playspace.

Maybe put a check in place on the playground so that it defaults to a certain model when the one in the URL isn't available and pop show some kind of message like This model isn't available anymore.

Smorty100

Nov 4, 2024

[FEATURE REQUEST]

Please give us the option to Write a prefix for the model response by typing something into its message field and then let the LLM complete it.
This would give us the ability to steer the models even better.

Here a short post about Why prefixes for LLMs can be real useful by me

John6666

Nov 4, 2024

Currently, when accessing the playground from HuggingChat, it sometimes give an Error 500 code.

This may be a variant of the problem.
https://huggingface.co./posts/Tonic/169924015276572

Smorty100

Nov 6, 2024

We have the qwen2.5-1.5B base model (not instruction tuned) on the playground but we don't have any kind of text completion interface yet.
So as it stands right now, the LLM is probably being fed with all sorts of chat tokens like <|im_start|> and <|im_end|>, which it doesn't know what to do with.

I tried to give it some start of a sentence, but it just ended up token-dumping on me.
So either we need a text completion interface like I requested previously, or we need to ban base models from the playground.
I would prefer a text completion interface ~

Smorty100

Nov 6, 2024

Aaand here I am again with another problem.
When using Zephyr beta for chat, it usually responds with <user> and then writes a prompt a user might ask. Which is surprising that Zephyr can even do that. I always though we only train the LLM on the response side and never on the input side.

It seems like the chat templates in general are broken for many models. Some models simply don't have a chat template and give an error, some kinda work but don't really, like here with zaphyr beta, and sometimes even base models get a chat template, even though they haven't been instruction tuned and thus don't know what to do with these chat tokens.

Is playground pulling the templates for these from somewhere? Maybe some of the repos don't include the correct template?
But especially with Zephyr by HuggingFace I would expect the template to be correct...

victor

Hugging Face org Nov 20, 2024

Thank you for your feedback and sorry for the late reply.

We have the qwen2.5-1.5B base model (not instruction tuned) on the playground but we don't have any kind of text completion interface yet.

Mhhh I don't think we have the base model but only the instruct one (Qwen/Qwen2.5-1.5B-Instruct) on hf.co/playground. And it seems to work for me, note that non-instruct models are not supported in the Playground.

But especially with Zephyr by HuggingFace I would expect the template to be correct...

Mhhh same for me with <|user|> I think it works on the model page widget but I dont remember if we did something special for Zephyr there cc @mishig maybe.

cfahlgren1

Hugging Face org Dec 13, 2024

Small nit:

If you switch models it deletes your system prompt, would be awesome if it could preserve that across model switching etc.

Smorty100

Dec 19, 2024

This one goes hand-in-hand with the feature request for function calling, but I'd like to see a JSON mode, like many other playgrounds have aswell, like Coheres playground

Smorty100

Dec 25, 2024

I use this playground a lot for iterating and improving my prompting on medium-sized LLMs (mostly llama3.2 3B).
Sadly, the iteration process takes a while, because every time I enter something else as the message, I first have remove the previous generation before I can generate a new one via mouse click, because else I get this error, complaining about a generation already being in place

This is very annoying. I would like to just press Ctrl+Enter a bunch of times while having the user message text box focused to check for good consistency with the prompts.

Also, and this is a very smol nit, but having built-in markdown UI for these discussions sections here would be very nice. I can live with it, but other users might not even realize this supports markdown. I only figured that out by uploading an image and recognizing the formatting.

MoritzLaurer

Hugging Face org 27 days ago

Hey @victor and all, what do you think about adding functionality to hf.co/playground for creating and saving prompt templates on the Hub? Platforms like Langfuse or Braintrust have some form of template creation and storage. The one from langfuse is e.g. quite good imo (video). And the prompt-templates library would provide a standard way of storing and reusing the templates on the hub. I think this would help make the hub more interesting for users who don't necessarily train or share models, but it can use useful for them to share and reuse prompts.

kingofpop

6 days ago

Hi, for some reason llama models are not working there anymore. Can you fix them please?

Spaces:

huggingface
/

inference-playground

Running

[FEEDBACK] Inference Playground

Inference Playground

About the Inference Playground:

Upcoming Features:

User input

AI output