Translation capability broken
mistralai/Mistral-7B-Instruct-v0.1
can translate to Croatian easily, but zephyr-7b-alpha
respond with: Unfortunately, I am not capable of translating the summary into Croatian
Can you please provide an input example where this happens so I can also test it?
For context, Zephyr was trained on English datasets (UltraChat and UltraFeedback) and Mistral's Instruct model potentially had some multilingual data that makes it better suited for translation tasks.
@lewtun
Sure, here is the exact prompt I used:
system_prompt = "You are a friendly assistant who knows all the books in the world. Respond to user queries clearly, strictly following the context of the question, and provide an answer within the requested number of words"
user_message = "summarize the book Master and Margarita in just 1000 words"
user_message2 = "then translate it to Croatian"
messages=f""""<s>[INST] {system_prompt} [/INST]
[INST] {user_message} [/INST]
[INST] {user_message2} [/INST]</s>"""
Ah, we use a different prompt template which might explain the issue. Can you try with this:
pipe = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-alpha", torch_dtype=torch.bfloat16, device_map="auto")
# We use the tokenizer's chat template to format each message - see https://huggingface.co./docs/transformers/main/en/chat_templating
messages = [
{
"role": "system",
"content": "You are a friendly assistant who knows all the books in the world. Respond to user queries clearly, strictly following the context of the question, and provide an answer within the requested number of words",
},
{"role": "user", "content": "summarize the book Master and Margarita in just 1000 words"},
]
# Generate 1st turn
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95, return_full_text=False)
print(outputs[0]["generated_text"])
messages.append({"role": "assistant", "content": outputs[0]["generated_text"]})
messages.append({"role": "user", "content": "then translate it to Croatian"})
# Generate 2nd turn
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95, return_full_text=False)
print(outputs[0]["generated_text"])
Btw the demo seems to roughly understand the instruction: https://huggingfaceh4-zephyr-chat.hf.space
@lewtun
thanks for your checking!
I tried using your template but got exactly the same result as before.
But as inference engine I use vLLM, which uses FastChat for templates and, apparently, the template for Zephyr is incorrect and the problem with incorrect generation lies in it.
Btw, template for Mistral is also contains some garbage in it, that's why I used Mistralai's string template instead of array.
When I used the Mistral template and the same approach as you (added the generated answer in English to the context), then with this context Zephyr was able to generate the correct translation into Croatian.
I found the right template to try with string prompt:
system_prompt = "You are a friendly assistant who knows all the books in the world. Respond to user queries clearly, strictly following the context of the question, and provide an answer within the requested number of words"
user_message = "summarize the book Master and Margarita in just 1000 words"
user_message2 = "then translate it to Croatian"
messages = f"""<|system|>{system_prompt}</s>
<|user|>{user_message} {user_message2}</s>
<|assistant|>"""
An seems Zephyr can't translate without adding context in english, but Mistral can do it.
Zephyr can provide with just croatian translation when I modify prompt like this:
messages = f"""<|system|>{system_prompt}</s>
<|user|>{user_message}</s>
<|assistant|></s>
<|user|>{user_message2}</s>
<|assistant|>"""
So the idea is to get both english and croatian response at once from the very single prompt and request.