DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting Paper • 2503.00784 • Published 7 days ago • 9
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning Paper • 2402.05808 • Published Feb 8, 2024
view article Article Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial By open-r1 • Jan 31 • 42
CritiQ: Mining Data Quality Criteria from Human Preferences Paper • 2502.19279 • Published 11 days ago • 9
CritiQ: Mining Data Quality Criteria from Human Preferences Paper • 2502.19279 • Published 11 days ago • 9
CritiQ: Mining Data Quality Criteria from Human Preferences Paper • 2502.19279 • Published 11 days ago • 9 • 2
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 46 items • Updated 11 days ago • 552
view article Article Saving Memory Using Padding-Free Transformer Layers during Finetuning By mayank-mishra • Jun 11, 2024 • 16
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments Paper • 2406.04151 • Published Jun 6, 2024 • 20