Curious
Hey! Just curious when you used it did you have stuff like this pop up?
[control_715]
I seem to frequently get that with different numbers and followed by the character's name and a colon.
Hey! I've never seen that come up but I've been using the model via exllamav2. I haven't tried running inference on the full weights directly.
That looks very similar to the padding token (https://huggingface.co./gghfez/Writer-Large-2411-v2.1/blob/main/special_tokens_map.json).
I've just tested "diffing" the tokenizer vs the original from Mistral-Large-2411
writer_tokenizer= AutoTokenizer.from_pretrained("gghfez/Writer-Large-2411-v2.1")
large_tokenizer= AutoTokenizer.from_pretrained("gghfez/Mistral-Large-Instruct-2411")
print("Writer vocab size:", len(writer_tokenizer))
print("Large vocab size:", len(large_tokenizer))
print("Writer special tokens:", writer_tokenizer.special_tokens_map)
print("Large special tokens:", large_tokenizer.special_tokens_map)
test_string = "Hello, world!"
print("Writer tokenization:", writer_tokenizer.encode(test_string))
print("Large tokenization:", large_tokenizer.encode(test_string))
Couldn't see a significant difference in the output:
Writer vocab size: 32768
Large vocab size: 32768
Writer special tokens: {'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '[control_746]'}
Large special tokens: {'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>'}
Writer tokenization: [1, 16998, 29493, 2294, 29576]
Large tokenization: [1, 16998, 29493, 2294, 29576]
And Vocab:
writer_vocab = set(writer_tokenizer.get_vocab().keys())
large_vocab = set(large_tokenizer.get_vocab().keys())
# Find tokens unique to each
writer_unique = writer_vocab - large_vocab
large_unique = large_vocab - writer_vocab
print("Tokens unique to Writer:", sorted(list(writer_unique)))
print("Tokens unique to Large:", sorted(list(large_unique)))
Output:
Tokens unique to Writer: []
Tokens unique to Large: []
Which inference engine and quant did you try? And I don't suppose you could include a raw prompt so I could reproduce it?
I did an FP8 on vLLM
Using out-of-the-box V7 Mistral prompt on Silly Tavern
Was just curious if you saw anything like that in your testing before I started experimenting on my end.
I'm digging it so far. It takes writing style suggestions in the prompt quite well.
I did an FP8 on vLLM
You must have a beast of a rig to run that :D
I tested renting an H200 and running it with vllm and managed to reproduce it using the vllm chat-completions API with mikupad. It didn't happen when using test-completions (which I normally use). The first token it produced was something like [control_58], and the other 9 probable tokens were all [control_] as well.
Switching to vllm's text completions didn't have this problem.
I also diff'd your tokenizer+vocab and found no differences (your FP8 quant seems fine).
For some reason I'm not able to reproduce it using TabbyAPI's chat-completions API with exl2. I'll have to investigate further + read up more on what these control tokens are used for when I have time (frantically deleting or trying to complete / release some of my 70% complete experiments/projects before the hugging-face billing period). I suspect it's going to be related to padding tokens left vs right.
I'm digging it so far. It takes writing style suggestions in the prompt quite well.
Glad to hear that, this is where most of the time I spent on it went! Mistral-Large-2411 is actually a great base for writing.
I did an FP8 on vLLM
You must have a beast of a rig to run that :D
I got temporary shared access to some 4xH200SXMs so having some fun with it 🤣
I didn't mean for you to spend much time on it. I was just curious if you saw anything yourself. I have done a few FP8s of 2407 based models and this was the second 2411 but the first that had this behavior. So I wanted to point it out.
I got temporary shared access to some 4xH200SXMs so having some fun with it 🤣
That's so awesome! Reminds me a time I happened to get over 95% off on an aws spot instance and... had to stay up all night to use it while I could lol
I didn't mean for you to spend much time on it. I was just curious if you saw anything yourself. I have done a few FP8s of 2407 based models and this was the second 2411 but the first that had this behavior. So I wanted to point it out.
No I really appreciate it you pointing it out, it's much better if I'm aware and can figure it out before I spend another $400 on cloud GPUs for the next project :D