Tools support in Ollama

#11
by jago93 - opened

Amazing work @matteogeniaccio and thanks @RDson for adding it to Ollama :)

QQ - Seems like the model does not support tools in Ollama ? https://ollama.com/vanilj/Phi-4

Screenshot 2025-01-07 at 14.13.09.png

Seems strange to me, maybe it's just an odd template - what's the template we should use to enable tools in Ollama ?

Thanks !

This is not easy to describe! Please look at this: "chat_template": "{% for message in messages %}{% if (message['role'] == 'system') %}{{'<|im_start|>system<|im_sep|>' + message[' content'] + '<|im_end|>'}}{% elif (message['role'] == 'user') %}{{'<|im_start|>user<|im_sep|>' + message['content'] + '<|im_end|><|im_start|>assistant<|im_sep|>'}}{% elif (message ['role'] == 'assistant') %}{{message['content'] + '<|im_end|>'}}{% endif %}{% endfor %}",

It does not support tools because there are no tools in chat_template.

There are only user and assistant.

However, it is said to be very good at json output and code output.

So, you can use system prompt and code to call tools.

In short, Your prompt may be like this:

You are an assistant.
You can call tools. Specifically, when you want to call a tool, please output it in this format <call tool>
get_time()
</call tool>
Or <Call tool>
search_google("Why do people need to eat")
</Call tool>

Among them, the content in the <Call tool></Call tool> tag is similar to the way Python code calls functions.

Then, I will put the tool The returned results are given to you in this format <tool return result>
2025/01/01/ 13:25:45
</tool ​​return result>
or <tool return result>
Title: Why do people need to eat. Content: Why do people need to eat. Link: xxxxx
Title: Why do people need to eat? Content: Why do people need to eat. Link: xxxxx
Title: Why do people need to eat. Content: Why do people need to eat. Link: xxxxx
</Tool return result>

Then, you use code to call llm, and when llm returns the content When the content returned by llm does not contain the tool call tag, it returns the content to the user, that is, you. This is an easy way to This is a way to let you understand the principle, but if you really do this, you may only be able to use llm in the command line, or you need to encapsulate it into an http openai API or you need to modify the code of the tool you are using, such as openwebui.

This is a bit cumbersome.

But most of the ways to implement tool calls are based on the tool tag in chat_template. A specific token.

I found another open source project https://huggingface.co./blog/smolagents

👆This The open source project allows llm to implement tool calls instead of using the tool tags in the traditional chat_template.
I guess its implementation is the same as what I described.

But it is possible to export it as openaiAPI.

In other words, the architecture is like this.

ollama -> smolagents -> openwebui (you web ui)

The challenge is that smolagents needs To support docking with Ollama, your interface needs to support OpenAI API. Or support docking with Smolagents.

Maybe you need to change to an interface that supports Smolagents, such as using OpenWebUI.

If you dock with LLM through code, you can directly dock with Smolagents or implement it yourself. You can first take a look at smolagents, which implements the core part of the LLM support tool.

I don't think the template can be changed, because llm is essentially a text generation model, if you change the template, it may not recognize the new things you added.

I tried changing the role from user to another name, but it didn't work.

Speaking of individual cases, in fact, I now feel that LLM has been widely used since ChatGPT was released.

Then, most people who use LLM are using OpenAIO API.

They don't know how LLM works, how to deploy it locally, and how to fine-tune it.

This seems to be a kind of knowledge that is not common on the Internet, difficult to understand, and always full of various formulas.

For example, many people still use "long-term memory" and "short-term memory" to describe the dialogue model.

There is no memory at all!

Send all information to LLM, LLM uses the attention mechanism, it notices the previous text, and then generates new text.

If it is not sent to LLM, LLM knows nothing.

As the conversation becomes longer, the tokens spent increase exponentially.

Especially for long-context models, attention and token costs are huge problems.

The process of training LLM is also very interesting. First, use a lot of arbitrary text to train, and then use question-answer pairs in a specific format to train.

Use some dirty question-answer pairs during training, these answers are rejected, to achieve human alignment.

You will find that it can only be used for text completion.

For example:

I want to eat...

It can complete: cake, pizza, KFC

Later, someone unexpectedly trained it with a specific format of question and answer pairs, and it was still completing, but we thought it was the answer.

For example
<|Question|> What cake is delicious? <|End Question|><|Answer|>

It will complete:
7-layer cake, and candles <|End Answer|>

You can try it with the pre-trained model of llama3.1. Any text completion model has the potential to answer questions.

ps: It may be that with the rise of large models, the data set for training text completion models has been polluted by question and answer models.

This kind of knowledge is interesting, but I feel that many people don’t know it.

How about making a tutorial? Will it be profitable? I am facing a financial crisis.

Sign up or log in to comment