Light-R1 Collection Surpassing R1-Distill from Scratch* with 70k Math Data through Curriculum SFT & DPO • 3 items • Updated 6 days ago • 9
Hallucination detection Collection Trained ModernBERT (base and large) for detection hallucinations in LLM responses. The models are trained as token classifications. • 4 items • Updated 5 days ago • 14
Rank1: Test-Time Compute for Reranking in Information Retrieval Paper • 2502.18418 • Published 12 days ago • 25
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment Paper • 2502.16894 • Published 14 days ago • 26
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Paper • 2502.18449 • Published 12 days ago • 67
Expect the Unexpected: FailSafe Long Context QA for Finance Paper • 2502.06329 • Published 28 days ago • 126
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback Paper • 2502.15027 • Published 17 days ago • 7
SIFT: Grounding LLM Reasoning in Contexts via Stickers Paper • 2502.14922 • Published 18 days ago • 29
Sky-T1-7B Collection A series of 7B models trained with different recipes and the corresponding training data. • 8 items • Updated 24 days ago • 6