Mamba distilled from Llama3.2 3B Instruct. The Mamba in the Llama: Distilling and Accelerating Hybrid Models (https://arxiv.org/abs/2408.15237).
Junxiong Wang
JunxiongWang
AI & ML interests
Attention Free Model / Subquadratic Language Models
Organizations
models
30
JunxiongWang/Mamba2InLlama3B_Half_DPO
Updated
•
79
JunxiongWang/Mamba2InLlama3B_Half
Updated
•
53
JunxiongWang/MambaByte_Stories
Text Generation
•
Updated
•
212
•
1
JunxiongWang/MambaByte_Arxiv
Text Generation
•
Updated
•
18
•
3
JunxiongWang/MambaByte_PG19_353M
Text Generation
•
Updated
•
11
JunxiongWang/MambaByte_Books
Text Generation
•
Updated
•
673
•
2
JunxiongWang/MambaByte_Code
Text Generation
•
Updated
•
466
•
2
JunxiongWang/MambaByte_PG19_972M
Text Generation
•
Updated
•
173
JunxiongWang/Mamba2InLlama_1
Updated
•
51
•
1
JunxiongWang/Mamba2InLlama_0_50
Updated
•
111