Papers
arxiv:2412.10856

RWKV-Lite: Deeply Compressed RWKV for Resource-Constrained Devices

Published on Dec 14, 2024
Authors:
,
,

Abstract

To deploy LLMs on resource-contained platforms such as mobile robots and smartphones, non-transformers LLMs have achieved major breakthroughs. Recently, a novel RNN-based LLM family, Repentance Weighted Key Value (RWKV) has shown strong computational efficiency; nevertheless, RWKV models still have high parameter counts which limited their deployment. In this paper, we propose a suite of compression techniques, ranging from model architecture optimizations to post-training compression, tailored to the RWKV architecture. Combined, our techniques reduce the memory footprint of RWKV models by 3.4x -- 5x with only negligible degradation in accuracy; compared to transformer LLMs with similar accuracy, our models require 4x less memory footprint.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2412.10856 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2412.10856 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2412.10856 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.