tzwilliam0/maxmin-dpo-init-kl-coef-0.5-fix-lora-dongnan Reinforcement Learning • Updated 9 days ago • 16
tzwilliam0/maxmin-dpo-init-kl-coef-0.1-fix-lora-dongnan Reinforcement Learning • Updated 9 days ago • 17
tzwilliam0/maxmin-dpo-init-kl-coef-0.1-fix-reward-norm-dongnan Reinforcement Learning • Updated 3 days ago • 5
tzwilliam0/maxmin-dpo-init-kl-coef-0.5-fix-reward-norm-dongnan Reinforcement Learning • Updated 3 days ago • 5