view article Article Yay! Organizations can now publish blog Articles By huggingface • 7 days ago • 30
view article Article MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era By MiniMax-AI • 13 days ago • 40
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 15 items • Updated Dec 22, 2024 • 207
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published 20 days ago • 249
DolphinLabeled Datasets Collection Eric Hartford has added labels to help you filter datasets, for your pleasure. • 5 items • Updated 21 days ago • 9
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published Nov 19, 2024 • 49
Balancing Pipeline Parallelism with Vocabulary Parallelism Paper • 2411.05288 • Published Nov 8, 2024 • 19
LoLCATS Collection Linearizing LLMs with high quality and efficiency. We linearize the full Llama 3.1 model family -- 8b, 70b, 405b -- for the first time! • 4 items • Updated Oct 14, 2024 • 15
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published Sep 18, 2024 • 37
Enhancing Training Efficiency Using Packing with Flash Attention Paper • 2407.09105 • Published Jul 12, 2024 • 15
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler Paper • 2408.13359 • Published Aug 23, 2024 • 23
Power-LM Collection Dense & MoE LLMs trained with power learning rate scheduler. • 4 items • Updated Oct 17, 2024 • 15