DPO Dataset Qs

#2
by RonanMcGovern - opened

Thanks for making this model!

Three Qs:

  • I understand teknium/OpenHermes-2.5 was used to originally do SFT on Mistral to get the base SFT model. But what dataset was used for the DPO itself?
  • I notice that teknium/OpenHermes-2.5 has "from" instead of "role" and "value" instead of "content"... so I suppose you swapped those when formatting the chat data for the DPO (and SFT) steps?
  • How did you decide what hyperparams to use for the DPO?
deleted
This comment has been hidden

I'd also love to know about the datasets used for DPO in case we can help improving them!

Sign up or log in to comment