SFTvsRL Models & Data
Collection
This collection contains 4 initial checkpoints for https://github.com/LeslieTrue/SFTvsRL and necessary data for V-IRL training.
•
6 items
•
Updated
•
8
This model serves as a initial checkpoint to reproduce results in paper SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training.
Website: https://tianzhechu.com/SFTvsRL/
Github: https://github.com/LeslieTrue/SFTvsRL
Arxiv: https://arxiv.org/abs/2501.17161v1
HF: https://huggingface.co./papers/2501.17161