Sao10K/I_am_alive_yay · A Request For reasoning model

santosgamer01

Nov 30, 2024

•

edited Nov 30, 2024

open source reasoning models are out is it possible to finetune them to get our outputs

qwen 32b preview

Good to have U back

Kenshiro-28

Nov 30, 2024

•

edited Nov 30, 2024

@Sao10K welcome back! :)

@santosgamer01 that model would be great, but anyway you can push reasoning on standard models with a good system prompt, although answers will have a standard length, which can be good depending of what you want. Try:

You are a friendly AI assistant. Reason through the query, then reflect on your reasoning, and finally provide your response.

You can replace the first sentence "You are a friendly AI assistant." with other personality like "You are Misato Katsuragi from the anime Neon Genesis Evangelion."

Nelathan

Dec 6, 2024

So nice to have you back @Sao10K !

A few days ago, I came across this post on Twitter about a paper that shows how training models on post-hoc reasoning helps them understand implicit meanings better. It got me thinking about how we could use this for improving roleplaying models, especially for things like emotional depth and realistic character behavior.

Here’s the rough idea:

Filter existing datasets to find examples where there’s implicit reasoning happening—like when a character’s motivations or emotions are hinted at but not directly stated.
Use tools like QwQ or o1 (or any other reasoning frameworks) to backpropagate through these examples, explicitly breaking down the logic. Basically, take those "why" questions and make the reasoning behind them crystal clear.
Train the model on these explicit reasonings so it doesn’t just regurgitate text—it actually starts to "understand" the why behind a character’s actions, decisions, or emotional responses.

If this works the way I’m imagining, we could see a big jump in emotional depth and the ability to pick up on subtleties like humor or complex interpersonal dynamics. Maybe even actual good humor, not just random one-liners that miss the tone.

What do you think? Would love to hear feedback or ideas to push this further!