Qwen2-Audio-7B-Instruct-rkllm
(English README see below)
在RK3588上运行强大的Qwen2-Audio-7B-Instruct音频大模型!
- 推理速度(RK3588, 输入10秒音频): 音频编码器 12.2s(单核NPU) + LLM 填充 4.4s (282 tokens / 64.7 tps) + 解码 3.69 tps
- 内存占用(RK3588, 上下文长度768): 11.6GB
使用方法
克隆或者下载此仓库到本地. 模型较大, 请确保有足够的磁盘空间.
开发板的RKNPU2内核驱动版本必须>=0.9.6才能运行这么大的模型. 使用root权限运行以下命令检查驱动版本:
> cat /sys/kernel/debug/rknpu/version RKNPU driver: v0.9.8
如果版本过低, 请更新驱动. 你可能需要更新内核, 或查找官方文档以获取帮助.
安装依赖
pip install numpy<2 opencv-python rknn-toolkit-lite2 librosa transformers
- 运行
python multiprocess_inference.py
如果实测性能不理想, 可以调整CPU调度器让CPU始终运行在最高频率, 并把推理程序绑定到大核(taskset -c 4-7 python multiprocess_inference.py
)
如果出现llvm相关的错误报错, 请更新llvmlite库: pip install --upgrade llvmlite
W rknn-toolkit-lite2 version: 2.3.0 Start loading audio encoder model (size: 1300.25 MB) Start loading language model (size: 8037.93 MB) I rkllm: rkllm-runtime version: 1.1.2, rknpu driver version: 0.9.8, platform: RK3588 Audio encoder loaded in 13.65 seconds I RKNN: [20:30:05.616] RKNN Runtime Information, librknnrt version: 2.3.0 (c949ad889d@2024-11-07T11:35:33) I RKNN: [20:30:05.616] RKNN Driver Information, version: 0.9.8 I RKNN: [20:30:05.617] RKNN Model Information, version: 6, toolkit version: 2.3.0(compiler version: 2.3.0 (c949ad889d@2024-11-07T11:39:30)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape W RKNN: [20:30:07.950] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.) Received ready signal: audio_ready Language model loaded in 9.94 seconds Received ready signal: llm_ready All models loaded, starting interactive mode... Enter your input (3 empty lines to start inference, Ctrl+C to exit, for example: 这是什么声音{{./jntm.mp3}}? What kind of sound is in {{./test.mp3}}? Describe the audio in {{./jntm.mp3}} 这是什么动物的叫声{{./jntm.mp3}}? ): 这是什么声音{{./jntm.mp3}}?????? Start audio inference... Received prompt: ====<|im_start|>system You are a helpful assistant.<|im_end|> <|im_start|>user Audio 1: <image> 这是什么声音??????<|im_end|> <|im_start|>assistant ==== /home/firefly/mnt/zt-back/Qwen2-7B-audiow/./multiprocess_inference.py:43: UserWarning: PySoundFile failed. Trying audioread instead. audio, _ = librosa.load(audio_path, sr=feature_extractor.sampling_rate) /home/firefly/.local/lib/python3.9/site-packages/librosa/core/audio.py:184: FutureWarning: librosa.core.audio.__audioread_load Deprecated as of librosa version 0.10.0. It will be removed in librosa version 1.0. y, sr_native = __audioread_load(path, offset, duration, dtype) Audio encoder inference time: 12.22 seconds (1, 251, 4096) (1, 251, 4096) Start LLM inference... 🎉 完成! Time to first token: 4.28 seconds 语音中是一段音乐,包含唱歌和乐器演奏。背景音乐里有鼓声、贝斯、钢琴和小号的演奏,同时背景能够听到胃里咕咕作响和吃东西的声音。这首歌可能是用于广告。 (finished) -------------------------------------------------------------------------------------- Stage Total Time (ms) Tokens Time per Token (ms) Tokens per Second -------------------------------------------------------------------------------------- Prefill 4269.62 283 15.09 66.28 Generate 13279.37 49 272.13 3.67 --------------------------------------------------------------------------------------
模型转换
准备工作
- 安装rknn-toolkit2 v2.3.0或更高版本, 以及rkllm-toolkit v1.1.2或更高版本.
- 下载此仓库到本地, 但不需要下载
.rkllm
和.rknn
结尾的模型文件. - 下载Qwen2-Audio-7B-Instruct的huggingface模型仓库到本地. (https://huggingface.co./Qwen/Qwen2-Audio-7B-Instruct)
转换LLM
- 将此仓库中的
rename_tensors.py
文件复制到Qwen2-Audio-7B-Instruct的huggingface模型仓库根目录并运行. 稍等片刻, 会生成model-renamed-00001-of-00004.safetensors
等4个safetensors文件和一个json文件. - 不用管那个json文件, 将那4个safetensors文件移动到此仓库根目录下.
- 执行
rkllm-convert.py
. 等一会, 会生成qwen.rkllm
, 就是转换后的模型.
转换音频编码器
- 打开
audio_encoder_export_onnx.py
, 修改文件最下方模型路径为Qwen2-Audio-7B-Instruct模型文件夹的路径. 然后执行. 等一会, 会生成audio_encoder.onnx
和很多权重文件. - 执行
audio_encoder_convert_rknn.py all
. 等一会, 会生成audio_encoder.rknn
, 这就是转换后的音频编码器.
已知问题
- 由于疑似RKLLM中存在的问题, 如果音频编码器和LLM加载进同一个Python进程, 可能会导致LLM推理时报错段错误. 可以使用多进程来解决. 参考
multiprocess_inference.py
. - 由于RKLLM中存在的问题, 输入序列较长时LLM推理会段错误. https://github.com/airockchip/rknn-llm/issues/123
- 由于RKLLM的多模态输入的限制, 在整个对话中只能加载一段音频. 可以通过Embedding输入的方式来解决, 但我没有实现.
- 没有实现多轮对话.
- RKLLM的w8a8量化貌似存在不小的精度损失, 并且这个模型的量化校准数据使用了RKLLM自带的wikitext数据集, 可能会导致精度明显下降.
参考
English README
Qwen2-Audio-7B-Instruct-rkllm
Run the powerful Qwen2-Audio-7B-Instruct audio model on RK3588!
- Inference speed (RK3588, 10s audio input): Audio encoder 12.2s (single NPU core) + LLM prefill 4.4s (282 tokens / 64.7 tps) + decoding 3.69 tps
- Memory usage (RK3588, context length 768): 11.6GB
Usage
Clone or download this repository. The model is large, please ensure sufficient disk space.
The RKNPU2 kernel driver version on your development board must be >=0.9.6 to run such a large model. Check the driver version with root privilege:
> cat /sys/kernel/debug/rknpu/version RKNPU driver: v0.9.8
If the version is too low, please update the driver. You may need to update the kernel or check official documentation for help.
Install dependencies
pip install numpy<2 opencv-python rknn-toolkit-lite2 librosa transformers
- Run
python multiprocess_inference.py
If the actual performance is not ideal, you can adjust the CPU scheduler to make the CPU run at the highest frequency and bind the inference program to big cores (taskset -c 4-7 python multiprocess_inference.py
)
If you encounter llvm-related errors, please update the llvmlite library: pip install --upgrade llvmlite
Model Conversion
Preparation
- Install rknn-toolkit2 v2.3.0 or higher, and rkllm-toolkit v1.1.2 or higher.
- Download this repository locally, but you don't need to download the model files ending with
.rkllm
and.rknn
. - Download the Qwen2-Audio-7B-Instruct huggingface model repository locally. (https://huggingface.co./Qwen/Qwen2-Audio-7B-Instruct)
Converting LLM
- Copy the
rename_tensors.py
file from this repository to the root directory of the Qwen2-Audio-7B-Instruct huggingface model repository and run it. Wait a moment, it will generate 4 safetensors files likemodel-renamed-00001-of-00004.safetensors
and a json file. - Ignore the json file, move those 4 safetensors files to the root directory of this repository.
- Execute
rkllm-convert.py
. Wait a while, it will generateqwen.rkllm
, which is the converted model.
Converting Audio Encoder
- Open
audio_encoder_export_onnx.py
, modify the model path at the bottom of the file to the path of your Qwen2-Audio-7B-Instruct model folder. Then execute it. Wait a while, it will generateaudio_encoder.onnx
and many weight files. - Execute
audio_encoder_convert_rknn.py all
. Wait a while, it will generateaudio_encoder.rknn
, which is the converted audio encoder.
Known Issues
- Due to a suspected issue in RKLLM, if the audio encoder and LLM are loaded into the same Python process, it may cause segmentation fault during LLM inference. This can be solved using multiprocessing. Refer to
multiprocess_inference.py
. - Due to an issue in RKLLM, LLM inference will segfault with long input sequences. See https://github.com/airockchip/rknn-llm/issues/123
- Due to RKLLM's multimodal input limitations, only one audio clip can be loaded in the entire conversation. This could be solved using Embedding input, but I haven't implemented it.
- Multi-turn dialogue is not implemented.
- RKLLM's w8a8 quantization seems to have significant precision loss, and this model's quantization calibration data uses RKLLM's built-in wikitext dataset, which may lead to noticeable accuracy degradation.
References
- Downloads last month
- 19
Model tree for happyme531/Qwen2-Audio-rkllm
Base model
Qwen/Qwen2-Audio-7B-Instruct