How do I make the model output JSON?
Is it just an instruction or is there a particular prompt syntax that I need to follow? I am using a GGUF version if that matters.
Currently I am using it with pydantic but sometimes the output cannot be parsed correctly.
I believe most of the 100% JSON output is achieved by guided decoding (in addition to telling the model to generate JSON).
If you are using Ollama, there is a json mode: https://github.com/ollama/ollama/blob/main/docs/api.md#request-json-mode and the example scripts are at https://github.com/ollama/ollama/tree/main/examples/python-json-datagenerator
Just make it output yaml instead! Save money on tokens and a headache of parsing broken JSON from any LLM model.
yaml? I never even saw yaml mentioned as a possibility. Is this a serious suggestion or just something to consume more of my time? How do I get it to output yaml? Just asking it nicely, or is there a "proper" way?
Could you please share sample code how you make it work with Pydantic? And which framework are you using the serve the model?
I cannot get it to work with TGI (model loads fine).
Actually, I no longer use pydantic. I instead consulted the documentation (lol) and now have this:
response = self.client.chat.completions.create(
model="gpt-3.5-turbo-1106",
messages=[
{"role": "system", "content": "You are a professional business researcher analyzing manufacturer websites."},
{"role": "user", "content": prompt}
],
response_format={"type": "json_object"},
temperature=0.1
)
I use llama.cpp to run the model using this command: "./llama-server -m ~models/qwen2.5-7b-instruct-q6_k-00001-of-00002.gguf -c 0 --mirostat 2 -fa -j {}
you can add your own flags for GPU offloading and other performance stuff.
Thank you very much. this solution also worked for me before, it will output Json. But if I want to use Pydantic, I cannot get it to work with TGI.