i would like to request a dataset :
one for coding ; (me i use visual basic ? but python and javascript/vbscript/ bash script is fine:
the main aim is to ask a question :
to perform a coding task : ie create a tokenizer in python:
the first colum would be the first try answer
the nexxt colum the anyalsis of this response ; if the code runs , what is missing , does this align with the user query and does it produce the expected output:
then the second colum would be the query (define this problem step by step)
the third colum would be the response genrated for the output using the steps:
the net colum and explanation of the query ? (what is a tokenizer? ie the definition)
the next colum would be the response folowing the defined:
with this we can create a detailed prompt template :
the definition of the concept:
which enables for the first guess,
the self anaylasis ;
then the defined process of step by step for this problem:
the new code following the guidelines ...
then i final output consisting of the step by step explanation. the defiition of the concept, the step by step output !
hence the whole process of defiing a problem by steps and describing the problem definintion , the guess and the final great output:
this producing a thinking train orgainsed in the thoughts of the model (ie all these parts are a part of the self checking and anyalasis)
i found that after training my models on this type of concept : as well as even reframing these inputs to be (internal agents generated outputs) ... it allowed the model to generate these agents and have a conversation internally before outputting the response:
for those of whom do not update thier template to use the thought paterns or chain of thoughts it does not matter as this will still happen internally .... but if you allow the bot to show its thoughts then you will see the whole process!!!
when it wanted to hallucenate an answer (it did not know or could not generate quite right answer) i noticed it arguing with itself !
we would like to add thinking to the mind: but we first must arrange its thoughts into many different thought chain types:
even we would like the model to generate internal agents and discusss , hence creating the datasets to allow use to model the internal thoughts around data, ( i have been using the dpo as a model ie: the rejected is the internal agents output and the chosen the bot output !
when using the prompt in day-to-day it generated these agents internaly and mimiced the training ! producing lovely results ... but because the rejected outputs are not always to be rejected it does cause a bit of down IQ... but this can be repaired again with good data :
i used a couple of your knowledge trees but they did not take hold or perform the way as we would have expected ..... (it wrong set up !... we need to provide the outputs or internal thinking to mimic first then it can generatre outputs to match !
see how it goes any way ! if you can makes some datsets like this using the models to produce theoutputs , even better , if we could use different agents to produce each output for the query getting a real diverse dataset and not a single trainers opinion ....