[Feedback] Model Feedback for improvements
Hey @Phil337 do you want to test this out. This started training immediately after 0.1, so censorship is not focussed on.
The dataset composition is different compared to 0.1. Tried to bake in more reasoning, code and creative bits
Please use the system prompt You are a helpful assistant
Quants - QuantFactory/Matter-0.2-7B-GGUF
DPO will be released in the next few days
Note: I didn't notice the system prompt until I finished the test, so I re-ran some parts and it seemed to do much better in key areas. For example, it correctly identified all the spelling/grammar errors, plus it got the hardest logic question right (which it got wrong twice in a row before). Out of curiosity, why should using "You are a helpful assistant" make a notable difference? It doesn't seem to make much of a difference with v0.1. Anyways, the following was my review, but now take it with a grain of salt. I'll be interested in testing the DPO version and seeing what kind of a difference it makes. Perhaps DPO guides the LLM more and makes the system prompt less relevant.
To clarify, with the exception of simple scripts I'm not a coder, hence don't test coding. Nor do I test what's covered by the standardized LLM tests like Arc & MMLU.
I primarily test what's overlooked, including censorship and alignment, hallucinations, poem/story/joke creation, grammar/spelling checking, various trick questions, and knowledge not covered by the MMLU, such as pop-culture.
With that said, your Matter-0.1-7B-boost-DPO-preview did better on my test overall than this one. Most notably, it was smarter, more reliably getting the logic and trick questions correct, although this one still did above average for a Mistral. 0.1 also did a better job with grammar/spelling check (e.g. this one kept missing things like not changing there to their).
The three areas this one did better at is (1) it had notably less censorship and moralizing (2) it got more of the fringe/pop-culture questions correct, and with fewer hallucinations (and 3) stories had fewer contradictions.
Thanks Phil
No, the system prompt is specific to 0.2, it's trained with You are a helpful assistant
as a default system prompt when no system prompt is in the dataset, so It performs better. For 0.1 it was left blank when not present
If you'd also like to compare DPO vs Non-DPO for 0.1, it's available here munish0838/Matter-0.1-7B-boost-GGUF
Complete 0.1 collection https://huggingface.co./collections/0-hero/matter-01-65fd369504a313d059816edc
@0-hero It's clear that 0.2 is a little better than 0.1, and 0.1 DPO is a little better than 0.1.
I noticed the same thing recently with Nous-Hermes-2-Mixtral-8x7b with and without DPO. When done right DPO noticeably improves performance.
It's nice having both versions of Matter and Nous-Hermes-2 to see the difference. For example, I previously assumed DPO was causing the censorship and moralizing, but they are there after SFT. I wonder if combining SFT & DPO with ORPO will turn out to be even better.
Moralising and censorship is coming from one or more of these datasets Dolphin, SlimOrca, lmsys-1m
. These sets are not used in 0.2. Although I didn’t dig deeper into this yet
@0-hero Thanks. I think you're right. I found what appeared to be the same censorship and moralizing when recently testing Dolphin 2.8 v0.2 and let ehartford know a couple days ago.