Qwen2-Audio-7B-Instruct-rkllm

(English README see below)

在RK3588上运行强大的Qwen2-Audio-7B-Instruct音频大模型!

  • 推理速度(RK3588, 输入10秒音频): 音频编码器 12.2s(单核NPU) + LLM 填充 4.4s (282 tokens / 64.7 tps) + 解码 3.69 tps
  • 内存占用(RK3588, 上下文长度768): 11.6GB

使用方法

  1. 克隆或者下载此仓库到本地. 模型较大, 请确保有足够的磁盘空间.

  2. 开发板的RKNPU2内核驱动版本必须>=0.9.6才能运行这么大的模型. 使用root权限运行以下命令检查驱动版本:

    > cat /sys/kernel/debug/rknpu/version 
    RKNPU driver: v0.9.8
    

    如果版本过低, 请更新驱动. 你可能需要更新内核, 或查找官方文档以获取帮助.

  3. 安装依赖

pip install numpy<2 opencv-python rknn-toolkit-lite2 librosa transformers
  1. 运行
python multiprocess_inference.py

如果实测性能不理想, 可以调整CPU调度器让CPU始终运行在最高频率, 并把推理程序绑定到大核(taskset -c 4-7 python multiprocess_inference.py)

如果出现llvm相关的错误报错, 请更新llvmlite库: pip install --upgrade llvmlite

W rknn-toolkit-lite2 version: 2.3.0
Start loading audio encoder model (size: 1300.25 MB)
Start loading language model (size: 8037.93 MB)
I rkllm: rkllm-runtime version: 1.1.2, rknpu driver version: 0.9.8, platform: RK3588

Audio encoder loaded in 13.65 seconds
I RKNN: [20:30:05.616] RKNN Runtime Information, librknnrt version: 2.3.0 (c949ad889d@2024-11-07T11:35:33)
I RKNN: [20:30:05.616] RKNN Driver Information, version: 0.9.8
I RKNN: [20:30:05.617] RKNN Model Information, version: 6, toolkit version: 2.3.0(compiler version: 2.3.0 (c949ad889d@2024-11-07T11:39:30)), target: RKNPU v2, target platform: rk3588, framework name: ONNX, framework layout: NCHW, model inference type: static_shape
W RKNN: [20:30:07.950] query RKNN_QUERY_INPUT_DYNAMIC_RANGE error, rknn model is static shape type, please export rknn with dynamic_shapes
W Query dynamic range failed. Ret code: RKNN_ERR_MODEL_INVALID. (If it is a static shape RKNN model, please ignore the above warning message.)
Received ready signal: audio_ready
Language model loaded in 9.94 seconds
Received ready signal: llm_ready
All models loaded, starting interactive mode...

Enter your input (3 empty lines to start inference, Ctrl+C to exit, for example: 
这是什么声音{{./jntm.mp3}}?
What kind of sound is in {{./test.mp3}}?
Describe the audio in {{./jntm.mp3}}
这是什么动物的叫声{{./jntm.mp3}}?
):

这是什么声音{{./jntm.mp3}}??????


Start audio inference...
Received prompt: ====<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Audio 1: <image>
这是什么声音??????<|im_end|>
<|im_start|>assistant

====
/home/firefly/mnt/zt-back/Qwen2-7B-audiow/./multiprocess_inference.py:43: UserWarning: PySoundFile failed. Trying audioread instead.
  audio, _ = librosa.load(audio_path, sr=feature_extractor.sampling_rate)
/home/firefly/.local/lib/python3.9/site-packages/librosa/core/audio.py:184: FutureWarning: librosa.core.audio.__audioread_load
       Deprecated as of librosa version 0.10.0.
       It will be removed in librosa version 1.0.
 y, sr_native = __audioread_load(path, offset, duration, dtype)
Audio encoder inference time: 12.22 seconds
(1, 251, 4096)
(1, 251, 4096)
Start LLM inference...
🎉 完成!

Time to first token: 4.28 seconds
语音中是一段音乐,包含唱歌和乐器演奏。背景音乐里有鼓声、贝斯、钢琴和小号的演奏,同时背景能够听到胃里咕咕作响和吃东西的声音。这首歌可能是用于广告。

(finished)

--------------------------------------------------------------------------------------
Stage         Total Time (ms)  Tokens    Time per Token (ms)      Tokens per Second       
--------------------------------------------------------------------------------------
Prefill       4269.62          283       15.09                    66.28                   
Generate      13279.37         49        272.13                   3.67                    
--------------------------------------------------------------------------------------

模型转换

准备工作

  1. 安装rknn-toolkit2 v2.3.0或更高版本, 以及rkllm-toolkit v1.1.2或更高版本.
  2. 下载此仓库到本地, 但不需要下载.rkllm.rknn结尾的模型文件.
  3. 下载Qwen2-Audio-7B-Instruct的huggingface模型仓库到本地. (https://huggingface.co./Qwen/Qwen2-Audio-7B-Instruct)

转换LLM

  1. 将此仓库中的rename_tensors.py文件复制到Qwen2-Audio-7B-Instruct的huggingface模型仓库根目录并运行. 稍等片刻, 会生成model-renamed-00001-of-00004.safetensors等4个safetensors文件和一个json文件.
  2. 不用管那个json文件, 将那4个safetensors文件移动到此仓库根目录下.
  3. 执行rkllm-convert.py. 等一会, 会生成qwen.rkllm, 就是转换后的模型.

转换音频编码器

  1. 打开audio_encoder_export_onnx.py, 修改文件最下方模型路径为Qwen2-Audio-7B-Instruct模型文件夹的路径. 然后执行. 等一会, 会生成audio_encoder.onnx和很多权重文件.
  2. 执行audio_encoder_convert_rknn.py all. 等一会, 会生成audio_encoder.rknn, 这就是转换后的音频编码器.

已知问题

  • 由于疑似RKLLM中存在的问题, 如果音频编码器和LLM加载进同一个Python进程, 可能会导致LLM推理时报错段错误. 可以使用多进程来解决. 参考multiprocess_inference.py.
  • 由于RKLLM中存在的问题, 输入序列较长时LLM推理会段错误. https://github.com/airockchip/rknn-llm/issues/123
  • 由于RKLLM的多模态输入的限制, 在整个对话中只能加载一段音频. 可以通过Embedding输入的方式来解决, 但我没有实现.
  • 没有实现多轮对话.
  • RKLLM的w8a8量化貌似存在不小的精度损失, 并且这个模型的量化校准数据使用了RKLLM自带的wikitext数据集, 可能会导致精度明显下降.

参考

English README

Qwen2-Audio-7B-Instruct-rkllm

Run the powerful Qwen2-Audio-7B-Instruct audio model on RK3588!

  • Inference speed (RK3588, 10s audio input): Audio encoder 12.2s (single NPU core) + LLM prefill 4.4s (282 tokens / 64.7 tps) + decoding 3.69 tps
  • Memory usage (RK3588, context length 768): 11.6GB

Usage

  1. Clone or download this repository. The model is large, please ensure sufficient disk space.

  2. The RKNPU2 kernel driver version on your development board must be >=0.9.6 to run such a large model. Check the driver version with root privilege:

    > cat /sys/kernel/debug/rknpu/version 
    RKNPU driver: v0.9.8
    

    If the version is too low, please update the driver. You may need to update the kernel or check official documentation for help.

  3. Install dependencies

pip install numpy<2 opencv-python rknn-toolkit-lite2 librosa transformers
  1. Run
python multiprocess_inference.py

If the actual performance is not ideal, you can adjust the CPU scheduler to make the CPU run at the highest frequency and bind the inference program to big cores (taskset -c 4-7 python multiprocess_inference.py)

If you encounter llvm-related errors, please update the llvmlite library: pip install --upgrade llvmlite

Model Conversion

Preparation

  1. Install rknn-toolkit2 v2.3.0 or higher, and rkllm-toolkit v1.1.2 or higher.
  2. Download this repository locally, but you don't need to download the model files ending with .rkllm and .rknn.
  3. Download the Qwen2-Audio-7B-Instruct huggingface model repository locally. (https://huggingface.co./Qwen/Qwen2-Audio-7B-Instruct)

Converting LLM

  1. Copy the rename_tensors.py file from this repository to the root directory of the Qwen2-Audio-7B-Instruct huggingface model repository and run it. Wait a moment, it will generate 4 safetensors files like model-renamed-00001-of-00004.safetensors and a json file.
  2. Ignore the json file, move those 4 safetensors files to the root directory of this repository.
  3. Execute rkllm-convert.py. Wait a while, it will generate qwen.rkllm, which is the converted model.

Converting Audio Encoder

  1. Open audio_encoder_export_onnx.py, modify the model path at the bottom of the file to the path of your Qwen2-Audio-7B-Instruct model folder. Then execute it. Wait a while, it will generate audio_encoder.onnx and many weight files.
  2. Execute audio_encoder_convert_rknn.py all. Wait a while, it will generate audio_encoder.rknn, which is the converted audio encoder.

Known Issues

  • Due to a suspected issue in RKLLM, if the audio encoder and LLM are loaded into the same Python process, it may cause segmentation fault during LLM inference. This can be solved using multiprocessing. Refer to multiprocess_inference.py.
  • Due to an issue in RKLLM, LLM inference will segfault with long input sequences. See https://github.com/airockchip/rknn-llm/issues/123
  • Due to RKLLM's multimodal input limitations, only one audio clip can be loaded in the entire conversation. This could be solved using Embedding input, but I haven't implemented it.
  • Multi-turn dialogue is not implemented.
  • RKLLM's w8a8 quantization seems to have significant precision loss, and this model's quantization calibration data uses RKLLM's built-in wikitext dataset, which may lead to noticeable accuracy degradation.

References

Downloads last month
19
Inference API
Unable to determine this model's library. Check the docs .

Model tree for happyme531/Qwen2-Audio-rkllm

Finetuned
(1)
this model