Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published 5 days ago • 30
RLHFlow MATH Process Reward Model Collection This is a collection of datasets and models of process reward modeling. • 15 items • Updated Nov 9 • 7