Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing Paper • 2502.14458 • Published 18 days ago • 2
The Mamba in the Llama: Distilling and Accelerating Hybrid Models Paper • 2408.15237 • Published Aug 27, 2024 • 41
BlackMamba: Mixture of Experts for State-Space Models Paper • 2402.01771 • Published Feb 1, 2024 • 25
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry Paper • 2402.04347 • Published Feb 6, 2024 • 15
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence Paper • 2404.05892 • Published Apr 8, 2024 • 36
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper • 2312.00752 • Published Dec 1, 2023 • 143
view article Article Hugging Face and JFrog partner to make AI Security more transparent 6 days ago • 18
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 199
Trained Models 🏋️ Collection They may be small, but they're training like giants! • 8 items • Updated Dec 3, 2024 • 18
Instella ✨ Collection Announcing Instella, a series of 3 billion parameter language models developed by AMD, trained from scratch on 128 Instinct MI300X GPUs. • 5 items • Updated 4 days ago • 5
Phi-4 Collection Phi-4 family of small language and multi-modal models. • 7 items • Updated 6 days ago • 108