alves's picture

18 29

alves

alvesrt

·

AI & ML interests

None yet

Recent Activity

liked a model 1 day ago

Lewdiculous/Datura_7B-GGUF-Imatrix

upvoted an article 1 day ago

Timm ❤️ Transformers: Use any timm model with transformers

upvoted a paper 2 days ago

Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing

View all activity

Organizations

alvesrt's activity

upvoted an article 1 day ago

Article

Timm ❤️ Transformers: Use any timm model with transformers

Jan 16

• 44

upvoted 9 papers 2 days ago

Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing

Paper • 2502.14458 • Published 18 days ago • 2

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Paper • 2408.15237 • Published Aug 27, 2024 • 41

MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24, 2024 • 56

Diffusion Models Without Attention

Paper • 2311.18257 • Published Nov 30, 2023 • 3

BlackMamba: Mixture of Experts for State-Space Models

Paper • 2402.01771 • Published Feb 1, 2024 • 25

Gated recurrent neural networks discover attention

Paper • 2309.01775 • Published Sep 4, 2023 • 10

RWKV: Reinventing RNNs for the Transformer Era

Paper • 2305.13048 • Published May 22, 2023 • 17

The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry

Paper • 2402.04347 • Published Feb 6, 2024 • 15

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Paper • 2404.05892 • Published Apr 8, 2024 • 36

upvoted a collection 2 days ago

fuck quadratic attention

11 items • Updated Apr 24, 2024 • 23

upvoted a paper 2 days ago

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 143

upvoted 2 articles 2 days ago

Article

Bamba: Inference-Efficient Hybrid Mamba2 Model

Dec 18, 2024

• 45

Article

Hugging Face and JFrog partner to make AI Security more transparent

6 days ago

• 18

upvoted a paper 2 days ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 199

upvoted 3 collections 2 days ago

Trained Models 🏋️

They may be small, but they're training like giants! • 8 items • Updated Dec 3, 2024 • 18

Instella ✨

Announcing Instella, a series of 3 billion parameter language models developed by AMD, trained from scratch on 128 Instinct MI300X GPUs. • 5 items • Updated 4 days ago • 5

Phi-4

Phi-4 family of small language and multi-modal models. • 7 items • Updated 6 days ago • 108