DPO Dataset Qs
#2
by
RonanMcGovern
- opened
Thanks for making this model!
Three Qs:
- I understand teknium/OpenHermes-2.5 was used to originally do SFT on Mistral to get the base SFT model. But what dataset was used for the DPO itself?
- I notice that teknium/OpenHermes-2.5 has "from" instead of "role" and "value" instead of "content"... so I suppose you swapped those when formatting the chat data for the DPO (and SFT) steps?
- How did you decide what hyperparams to use for the DPO?
I'd also love to know about the datasets used for DPO in case we can help improving them!