Text Generation
Transformers
Safetensors
llama
text-generation-inference
Inference Endpoints

Inference servers fail with the multiple chat templates

#5
by stelterlab - opened

Hi!

I tried to fire the model with inference servers as vLLM and llama.cpp (as GGUF) and both fail when detecting the multiple chat templates for the different languages.

vllm        | ERROR 11-27 06:51:09 serving_chat.py:170] ValueError: This model has multiple chat templates with no default specified! Please either pass a chat template or the name of the template you wish to use to the `chat_template` argument. Available template names are ['BG', 'CS', 'DA', 'DE', 'EL', 'EN', 'ES', 'ET', 'FI', 'FR', 'GA', 'HR', 'HU', 'IT', 'LT', 'LV', 'MT', 'NL', 'PL', 'PT', 'RO', 'SK', 'SL', 'SV']

You can not choose the template names. You can only specify a complete chat template as alternative.

A more generic chat template as used by Mistral & Co would solve this.

May approach:

{%- if messages[0]["role"] == "system" %}
{{- messages[0]['role']|capitalize + ': ' + messages[0]['content'] + '\\n' }}
{%- set loop_messages = messages[1:] %}
{%- else %}
System: Ein Gespräch zwischen einem Menschen und einem Assistenten mit künstlicher Intelligenz. Der Assistent gibt hilfreiche und höfliche Antworten auf die Fragen des Menschen.{{- '\\n'}}
{%- set loop_messages = messages %}
{%- endif %}
{%- for message in loop_messages %}
{%- if (message['role']|lower == 'user') != (loop.index0 % 2 == 0) %}
{{- raise_exception('Roles must alternate User/Assistant/User/Assistant/...') }}
{%- endif %}
{%- if message['role']|lower == 'user' %}
{{- message['role']|capitalize + ': ' + message['content'] + '\\n' }}
{%- elif message['role']|lower == 'assistant' %}
{{- message['role']|capitalize + ': ' + message['content'] + eos_token + '\\n' }}
{%- else %}
{{- raise_exception('Only user and assistant roles are supported!') }}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- 'Assistant: '}}
{%- endif %}

This worked for me. As far as I know the default fixed System prompt I added as fallback is not standard in those other templates.

Kind regards,

Christian Stelter

The problem is that the tokenizer config of this model does not even specify a chat template. https://huggingface.co./openGPT-X/Teuken-7B-instruct-research-v0.4/blob/main/tokenizer_config.json

It would be great if the authors could specify what instruction template was used during training.

Hi @stelterlab , thanks for your investigations. We now added a default system prompt, which is used, if no chat_template language is provided: https://huggingface.co./openGPT-X/Teuken-7B-instruct-research-v0.4/commit/0aaf3bb89362d880fe0495aa142d10cfb61f7419

@mbrack thanks for pointing this out. We have specified the chat template directly within the Tokenizer file: https://huggingface.co./openGPT-X/Teuken-7B-instruct-research-v0.4/blob/main/gptx_tokenizer.py#L458

We also added some sample for the usage with vLLM to the Readme: https://huggingface.co./openGPT-X/Teuken-7B-instruct-research-v0.4#usage-with-vllm-server

In case you want to test custom system messages, you could specify a custom chat template with vLLM as described here https://huggingface.co./openGPT-X/Teuken-7B-instruct-research-v0.4#usage-with-vllm-server and set the --chat-template parameter to the Jinja-Template above or to the following Jinja-Template:

{%- if messages[0]["role"] == "system" %}
{{- messages[0]['role']|capitalize + ': ' + messages[0]['content'] + '\\n' }}
{%- set loop_messages = messages[1:] %}
{%- else %}
System: Ein Gespräch zwischen einem Menschen und einem Assistenten mit künstlicher Intelligenz. Der Assistent gibt hilfreiche und höfliche Antworten auf die Fragen des Menschen.{{- '\\n'}}
{%- set loop_messages = messages %}
{%- endif %}
{%- for message in loop_messages %}
{%- if message['role']|lower == 'user' %}
{{- message['role']|capitalize + ': ' + message['content'] + '\\n' }}
{%- elif message['role']|lower == 'assistant' %}
{{- message['role']|capitalize + ': ' + message['content'] + '</s>' + '\\n' }}
{%- else %}
{{- raise_exception('Only user and assistant roles are supported!') }}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- 'Assistant: '}}
{%- endif %}

Nonetheless, we suggest using these system messages listed here https://huggingface.co./openGPT-X/Teuken-7B-instruct-research-v0.4/blob/main/gptx_tokenizer.py#L434, as we used this selection of system messages for training this instruction-tuned model.

Sign up or log in to comment