Model Generating Prefix

#85
by dongyulin - opened

I am trying get rid of "<|start_header_id|>assistant<|end_header_id|>" and "<|eot_id|>" when generating output on TextIteratorStreamer, is there anyway I can achieve this on when generating output?

streamer = TextIteratorStreamer(tokenizer, skip_prompt=True)

model.generate(model_inputs, streamer=streamer, pad_token_id=tokenizer.eos_token_id, max_new_tokens=1000,do_sample=True)

I assume you ended up finding an answer, but I was wondering the same thing and found that

streamer = TextIteratorStreamer(
tokenizer,
skip_prompt=True,
decode_kwargs=dict(skip_special_tokens = True)
)

removes the prompt and almost all special tokens, and

for new_text in streamer:
print(new_text)
generated_text += new_text
if "<|eot_id|>" in new_text:
new_text = new_text.replace("<|eot_id|>", "")
yield new_text

removes "<|eot_id|>".

Sign up or log in to comment