Model keeps outputting weird text
I experienced very weird behaviors of the model. It kept outputting non-sense text. For example:
I use the model as is, without quantization or conversion to other formats. The chat template that I use is LLama.
In the image above, the "gray" text is the input, and the "white" text is the model response. For some reason, it includes the """[INST] ... on the DNA template""" paragraph that is not part of my input. Pretty weird.
For some other questions, the model does not include weird text like that but includes the text CONTEXT INFORMATION at the beginning of the response and a single, double quote at the end of the response.
Please let me know your thoughts.
Hello,
I ran a quick conversation that is an approximation of what you shared in the screenshot (I did not try and transcribe all the text in the image).
root@llgh01:/repos/FastChat/fastchat/serve# python cli.py --max-gpu-memory=40GB --model-path=/abacus/models/Smaug-72B-v0.1 --conv-template=llama-2 --temperature=0.0 --max-new-tokens=200 --multiline
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
[INST] [ctrl-d/z on empty line to end]: Find abbreviations in this text and return a JSON object with the abbreviation as a key and the meaning as the value.
While other mucosal barrier tissues orchestrate adaptive immunity through well-defined mucosa-associated lymphoid tissue (MALT).
[/INST]: The given text does not contain any abbreviations. Therefore, there are no abbreviations to return as a JSON object.
[INST] [ctrl-d/z on empty line to end]: What about MALT?
[/INST]: MALT in the given text refers to "mucosa-associated lymphoid tissue." It is not an abbreviation in this context, but rather a full term used in the field of immunology.
[INST] [ctrl-d/z on empty line to end]: Can you output that as JSON
[/INST]: Sure, but there's no abbreviation to create a JSON object for in this case. Here can consider this as an example if there was one: {"MALT": "mucosa-associated lymphoid tissue"}
While it would have been nice for the model to recognize MALT as an abbreviation in the first turn, it is pretty clear that the model is not producing garbled text and that it is able to remain coherent over multiple turns with this conversation template.
I'm a novice, and tried this model in LM Studio on an M3 Max -- it output gibberish if I ran it on the Metal GPU, but if I restricted it to CPU, it was coherent. With the above prompt, it generated the json and recognized MALT as an acronym -- I'm using a 5-bit quantized GGUF version from sensable, though.
The issues people are facing are related to errors in alternate model formats / quantization. We are unable to support these alternate options.