[Community] Please share your prompts and experiences: positive or negative!
I do a lot of my own testing, but it'd be nice to hear from other users if they've had any significantly positive or negative experiences using this model. When posting, please be sure to include the quant used (as quants below a static Q8_0 may have precision loss) and inference settings. Thanks, GLHF!
T1
Hello, I'm testing LLM by giving it a theme, having it come up with a story, and then having it generate a caption-style image prompt. It's a pretty inefficient way of testing prompt generation, but it's fun to see how LLM responds. The theme is given in NSFW Japanese, and the prompt is output in English, so it also requires multilingual performance. It's not really fair, but it's a kind of benchmark.π
Fewer than half of the models are able to reach the prompt with the result. There are also cases where the response is refused, but this is because they need to overcome various challenges, such as language comprehension, command comprehension, real-world knowledge, vocabulary, and creativity. I always give Likes to LLM that have overcome these challenges. This does not mean that I will not give Likes otherwise.
The ZEUS series consistently generates prompts as a result of the test. If I notice anything special during the test in the future, I will come here to write about it.π
Interesting; do you have a more SFW example you could show? If it's in JP that's OK. I'd be particularly interested if you have an example that you've used across across a couple models! V2, V8, and V2-ORPO are my current recommendations for regular use.
I haven't tried many SFW examples...
I think some LLM leaderboards will evaluate them numerically.
There are cases where I try to include very simple SFW system prompts and user prompts to see if the model that couldn't complete the above tasks in a few inputs is broken or not. In the case of ZEUS, it has not been tested because it completed the task.
I don't have much opportunity to evaluate individual models in detail...
Of course I could test ZEUS's Japanese language capabilities, but there is no reliable comparison in my brain.
Looking at the number of downloads of GGUF on HF in general, I think that someone somewhere in the world is using it...
There is very little feedback on the Hub. Well, that's how OSS is, including AI.π
There are also model authors who have set up their own demos and are collecting history on their data sets. I haven't read the source code in detail, but I think that would be useful for testing the model's operation.