A bilingual instruction-tuned model of Qwen-7B(https://huggingface.co./Qwen/Qwen-7B) for QAnything(https://github.com/netease-youdao/QAnything).

  1. Run Qwen-7B-QAnything using FastChat API with Huggingface transformers runtime backend
## Step 1. Prepare the QAnything project and download local Embedding/Rerank models.

git clone https://github.com/netease-youdao/QAnything.git
cd /path/to/QAnything && mkdir -p tmp && cd tmp
git lfs install
git clone https://huggingface.co./netease-youdao/QAnything
unzip QAnything/models.zip
cd - && mv tmp/models .

## Step 2. Download the public LLM model (e.g., Qwen-7B-QAnything) and save to "/path/to/QAnything/assets/custom_models"
cd /path/to/QAnything/assets/custom_models
git clone https://huggingface.co./netease-youdao/Qwen-7B-QAnything

## Step 3. Execute the service startup command.  Here we use "-b hf" to specify the Huggingface transformers backend.
## Here we use "-b hf" to specify the transformers backend that will load model in 8 bits but do bf16 inference as default for saving VRAM.
cd /path/to/QAnything
bash ./run.sh -c local -i 0 -b hf -m Qwen-7B-QAnything -t qwen-7b-qanything
  1. Run Qwen-7B-QAnything using FastChat API with vllm runtime backend

## Step 1. Prepare the QAnything project and download local Embedding/Rerank models.

git clone https://github.com/netease-youdao/QAnything.git
cd /path/to/QAnything && mkdir -p tmp && cd tmp
git lfs install
git clone https://huggingface.co./netease-youdao/QAnything
unzip QAnything/models.zip
cd - && mv tmp/models .

## Step 2. Download the public LLM model (e.g., Qwen-7B-QAnything) and save to "/path/to/QAnything/assets/custom_models"
cd /path/to/QAnything/assets/custom_models
git clone https://huggingface.co./netease-youdao/Qwen-7B-QAnything

## Step 3. Execute the service startup command.  Here we use "-b vllm" to specify the Huggingface transformers backend.
## Here we use "-b vllm" to specify the vllm backend that will do bf16 inference as default.
## Note you should adjust the gpu_memory_utilization yourself according to the model size to avoid out of memory (e.g., gpu_memory_utilization=0.81 is set default for 7B. Here, gpu_memory_utilization is set to 0.85 by "-r 0.85").
cd /path/to/QAnything
bash ./run.sh -c local -i 0 -b vllm -m Qwen-7B-QAnything -t qwen-7b-qanything -p 1 -r 0.85

license: apache-2.0

License Agreement This project is open source under the Tongyi Qianwen Research License Agreement. You can view the complete license agreement in this link: [https://github.com/QwenLM/Qwen/blob/main/Tongyi%20Qianwen%20RESEARCH%20LICENSE%20AGREEMENT].

During the use of this project, please ensure that your usage behavior complies with the terms and conditions of the license agreement.

Downloads last month
31
Inference Examples
Inference API (serverless) does not yet support model repos that contain custom code.