Can LLMs Maintain Fundamental Abilities under KV Cache Compression? Paper • 2502.01941 • Published 6 days ago • 10
FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation Paper • 2502.01068 • Published 7 days ago • 14
FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation Paper • 2502.01068 • Published 7 days ago • 14
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks Paper • 2402.09025 • Published Feb 14, 2024 • 7
hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4 Text Generation • Updated Aug 7, 2024 • 12.9k • 21
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks Paper • 2402.09025 • Published Feb 14, 2024 • 7
Mixture-of-Agents Enhances Large Language Model Capabilities Paper • 2406.04692 • Published Jun 7, 2024 • 56