Hao Sun

Holarissun

https://holarissun.github.io/

AI & ML interests

[email protected]. Deep RL, RL x LLM, RLHF.

Recent Activity

upvoted a paper 17 days ago

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

upvoted a paper 18 days ago

Rethinking Diverse Human Preference Learning through Principal Component Analysis

updated a model 9 months ago

Holarissun/SFT_gemma2b_hh-rlhf-helpful-gpt4_lr5e-06_epoch2-subset-1

View all activity

Organizations

None yet

Papers 3

arxiv:2310.07747

arxiv:2310.06147

arxiv:2207.05161

models 356

Holarissun/SFT_gemma2b_hh-rlhf-helpful-gpt4_lr5e-06_epoch2-subset-1

Updated Jun 17, 2024 • 3

Holarissun/SFT_gemma2b_hh-rlhf-helpful_lr5e-06_epoch2-subset-1

Updated Jun 17, 2024 • 6

Holarissun/REPROD_dpo_helpfulhelpful_gpt4_subset-1_modelgemma7b_maxsteps10000_bz8_lr5e-06

Updated May 29, 2024 • 4

Holarissun/REPROD_dpo_harmlessharmless_gpt4_subset-1_modelgemma7b_maxsteps10000_bz8_lr5e-06

Updated May 29, 2024 • 5

Holarissun/REPROD_dpo_helpfulhelpful_human_subset-1_modelgemma7b_maxsteps10000_bz8_lr5e-06

Updated May 29, 2024 • 3

Holarissun/REPROD_dpo_harmlessharmless_human_subset-1_modelgemma7b_maxsteps10000_bz8_lr5e-06

Updated May 29, 2024 • 2

Holarissun/REPROD_dpo_helpfulhelpful_human_subset-1_modelgemma2b_maxsteps10000_bz8_lr5e-06

Updated May 28, 2024 • 3

Holarissun/REPROD_dpo_harmlessharmless_human_subset-1_modelgemma2b_maxsteps10000_bz8_lr5e-06

Updated May 28, 2024 • 2

Holarissun/REPROD_dpo_helpfulhelpful_human_subset-1_modelgemma2b_maxsteps10000_bz8_lr5e-05

Updated May 25, 2024 • 3

Holarissun/REPROD_dpo_harmlessharmless_human_subset-1_modelgemma2b_maxsteps6000_bz8_lr5e-05

Updated May 24, 2024 • 3

datasets

None public yet

Hao Sun

AI & ML interests

Recent Activity

Organizations

Papers 3

models 356 Sort: Recently updated

datasets

models 356