arxiv:2310.07747
Hao Sun
Holarissun
AI & ML interests
[email protected]. Deep RL, RL x LLM, RLHF.
Organizations
None yet
Papers
3
models
356
Holarissun/SFT_gemma2b_hh-rlhf-helpful-gpt4_lr5e-06_epoch2-subset-1
Updated
•
2
Holarissun/SFT_gemma2b_hh-rlhf-helpful_lr5e-06_epoch2-subset-1
Updated
Holarissun/REPROD_dpo_helpfulhelpful_gpt4_subset-1_modelgemma7b_maxsteps10000_bz8_lr5e-06
Updated
•
2
Holarissun/REPROD_dpo_harmlessharmless_gpt4_subset-1_modelgemma7b_maxsteps10000_bz8_lr5e-06
Updated
Holarissun/REPROD_dpo_helpfulhelpful_human_subset-1_modelgemma7b_maxsteps10000_bz8_lr5e-06
Updated
Holarissun/REPROD_dpo_harmlessharmless_human_subset-1_modelgemma7b_maxsteps10000_bz8_lr5e-06
Updated
Holarissun/REPROD_dpo_helpfulhelpful_human_subset-1_modelgemma2b_maxsteps10000_bz8_lr5e-06
Updated
Holarissun/REPROD_dpo_harmlessharmless_human_subset-1_modelgemma2b_maxsteps10000_bz8_lr5e-06
Updated
Holarissun/REPROD_dpo_helpfulhelpful_human_subset-1_modelgemma2b_maxsteps10000_bz8_lr5e-05
Updated
Holarissun/REPROD_dpo_harmlessharmless_human_subset-1_modelgemma2b_maxsteps6000_bz8_lr5e-05
Updated
•
2
datasets
None public yet