Generalizable Reward Models
-
Ray2333/GRM-llama3-8B-sftreg
Text Classification • Updated • 109 • 5 -
Ray2333/GRM-llama3-8B-distill
Text Classification • Updated • 76 • 6 -
Ray2333/GRM-Gemma-2B-sftreg
Text Classification • Updated • 110 • 4 -
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs
Paper • 2406.10216 • Published • 2