Has a massive repetition problem

#29
by Delcos - opened

Even in the demo included on huggingface, it has a massive rep problem. If it's left to write out a document or anything of any complexity it usually tails off into repetition even with a higher penalty.

"My name is Teven and I am a 17 year old student from the Netherlands. I am currently in my last year of high school and I am planning to study computer science at the university. I have been interested in computers and technology for as long as I can remember. I have always been fascinated by the way computers work and how they can be used to solve problems.

I have been programming since I was 12 years old and I have always enjoyed it. I have used a variety of programming languages, including Python, Java, and C++. I have also used a variety of software development tools, including Eclipse, Visual Studio, and NetBeans.

I have always been interested in learning new things and I am always looking for new challenges. I am always looking for ways to improve my skills and I am always looking for new ways to use my skills. I am always looking for new ways to use my skills to help others.

I am always looking for new ways to use my skills to help others. I am always looking for new ways to use my skills to"

I noticed this too. Increasing the temperature helps

Does it keep on generating text until it reaches the max_generated_tokens every time?

This comment has been hidden

I have problem with generating text until it reaches the max_generated_tokens on Tesla V100S GPU, but Colab A100 works ok. Any idea what can be done with this?

I have problem with generating text until it reaches the max_generated_tokens on Tesla V100S GPU, but Colab A100 works ok. Any idea what can be done with this?

You are running the exact same script + environment + component versions and you are getting different results with different gpus?

You are running the exact same script + environment + component versions and you are getting different results with different gpus?

Ok, I double checked this. The problem occurred with my qlora finetuned model. Base model works ok on both gpu.

@adriata did you figure out eventually how to deal with repetition problem with mistral qlora fine tuned model?
I also face this problem when I attempted to fine tuned on Chinese languages instruction dataset. no effective solution resolve my problem ...🥺

Yeah, I am also running into the same issue even after increasing the temperature and penalty. Is this the same problem with Mistral-7b-v0.2 or with the Mistral-7b-instruct model?

Hi, remember to set PAD token != EOS during finetuning.

Hi, remember to set PAD token != EOS during finetuning.

Hi could you explain more, thanks!

During fine tuning model can 'forget' about eos token when we set it as pad token. It's because of masking - we don't learn model to predict padding.
IDN what you use for fine tune, but setting this helped me with repetition:

tokenizer = AutoTokenizer.from_pretrained(
    base_model_id,
    model_max_length=4096,
    padding_side="left",
    add_eos_token=True # add eos at the end of text  
)

# tokenizer.pad_token = tokenizer.eos_token  # don't use this 
tokenizer.pad_token = tokenizer.unk_token  # use this
model.config.pad_token_id = tokenizer.pad_token_id 

My dummy fine-tuning sample:
https://github.com/atadria/llm_calculator/blob/main/mistral_finetune.ipynb

What value did you set for pad_token_id? Could you explain more on what the pad_token_id does?

I also read this article: https://huggingface.co./NousResearch/Yarn-Mistral-7b-128k/discussions/3, but I am confused between what value to set for eos_token_id and pad_token_id. Has anyone played around with these two parameters and got the issue resolves? Please let me know.

Hi @Rmote6603 did you manage to solver this issue? Facing the same problem myself. Thanks!

Sign up or log in to comment