Towards Best Practices for Open Datasets for LLM Training Paper • 2501.08365 • Published Jan 14 • 56
Preference Leakage: A Contamination Problem in LLM-as-a-judge Paper • 2502.01534 • Published Feb 3 • 39
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 199
Instruction-Following Evaluation for Large Language Models Paper • 2311.07911 • Published Nov 14, 2023 • 20