The MHA2MLA model published in the paper "Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-Based LLMs"
-
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
Paper • 2502.14837 • Published • 1 -
fnlp/Llama-2-7B-MLA-d_kv_16
Text Generation • Updated • 19 -
fnlp/Llama-2-7B-MLA-d_kv_32
Text Generation • Updated • 9 -
fnlp/Llama-2-7B-MLA-d_kv_64
Text Generation • Updated • 10