view article Article Formatting Datasets for Chat Template Compatibility By nroggendorff • Jun 28, 2024 • 8
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs Paper • 2402.14740 • Published Feb 22, 2024 • 13
HARP: Hesitation-Aware Reframing in Transformer Inference Pass Paper • 2412.07282 • Published Dec 10, 2024 • 5