chat template doesn't include tools

by copasseron - opened 6 days ago

6 days ago

Hi mistral team,

nice to see a new model from you guys, thanks a lot.

https://huggingface.co./mistralai/Mistral-Small-24B-Instruct-2501/blob/main/tokenizer_config.json#L9010

in the jinja chat template we don't have anything related to tool (not available nor to put the tools result in the history of messages sent to the model), is it intended?

ollama do include it on their side:

https://ollama.com/library/mistral-small/blobs/5de2b8ebfbdd

{{- range $index, $_ := .Messages }}
{{- if eq .Role "system" }}[SYSTEM_PROMPT] {{ .Content }}[/SYSTEM_PROMPT]
{{- else if eq .Role "user" }}
{{- if and (le (len (slice $.Messages $index)) 2) $.Tools }}[AVAILABLE_TOOLS] {{ $.Tools }}[/AVAILABLE_TOOLS]
{{- end }}[INST] {{ .Content }}[/INST]
{{- else if eq .Role "assistant" }}
{{- if .Content }} {{ .Content }}
{{- if not (eq (len (slice $.Messages $index)) 1) }}</s>
{{- end }}
{{- else if .ToolCalls }}[TOOL_CALLS] [
{{- range .ToolCalls }}{"name": "{{ .Function.Name }}", "arguments": {{ .Function.Arguments }}}
{{- end }}]</s>
{{- end }}
{{- else if eq .Role "tool" }}[TOOL_RESULTS] {"content": {{ .Content }}}[/TOOL_RESULTS]
{{- end }}
{{- end }}

thanks a lot

patrickvonplaten

Mistral AI_ org 6 days ago

We've tested function calling only with vLLM: https://huggingface.co./mistralai/Mistral-Small-24B-Instruct-2501#function-calling
The model should work very well for function calling tasks!

Can you give this a try?

Also, we'd be more than happy about any contribution to make function calling work with HF format.

v3ss0n

5 days ago

•

edited 5 days ago

It was working fine before and recent commit added strftime and it is bugging at Text-Generation-Inference now

LHC88

3 days ago

•

edited 3 days ago

@patrickvonplaten I was gonna test that today as well. That works without applying the template extension from OP or did you include it?

Also did you try this on openai compatible vllm endpoint or just the offline inference?

v3ss0n

2 days ago

On TGI , it works without op template. Now it is broken after they included strftime

kldzj

1 day ago

On TGI , it works without op template. Now it is broken after they included strftime

The latest TGI commit fixes this.

But regarding the original topic, I'm getting this error when using tool calling: Template error: syntax error: Only user, system and assistant roles are supported! .

copasseron

about 13 hours ago

On TGI , it works without op template. Now it is broken after they included strftime

The latest TGI commit fixes this.

But regarding the original topic, I'm getting this error when using tool calling: Template error: syntax error: Only user, system and assistant roles are supported! .

yes, that is my concern also.

I'm deploying the model with nvidia triton + vllm backend, so I can't use the LLM.chat() endpoint of vLLM.

the vLLM backend of triton uses https://docs.vllm.ai/en/v0.6.5/dev/engine/async_llm_engine.html , that takes the text directly before passing it to the tokenizer.

I'm obliged to template the messages first (either applying the jinja template myself or using transformers apply_chat_template() https://huggingface.co./docs/transformers/v4.37.1/chat_templating method, that uses the chat template here: https://huggingface.co./mistralai/Mistral-Small-24B-Instruct-2501/blob/20b2ed1c4e9af44b9ad125f79f713301e27737e2/tokenizer_config.json#L9010 ).

However, the chat template provided for this new model doesn't support tools (neither the response in the history of messages, nor the available tools to use).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment