tetf
/

GGUF
Inference Endpoints
conversational

基於 taide/Llama-3.1-TAIDE-LX-8B-Chat 使用 llama.cpp b4739 版量化過的模型。

如何在 Kuwa 中執行量化後的 TAIDE 模型

Windows 版

  1. 安裝 Kuwa v0.3.4 Windows版安裝檔
  2. 進入 C:\kuwa\GenAI OS\windows\executors\taide 目錄下,並將原始模型taide-8b-a.3-q4_k_m.gguf備份到此目錄外,並刪除run.bat
  3. tetf/Llama-3.1-TAIDE-LX-8B-Chat-GGUF下載 Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.ggufC:\kuwa\GenAI OS\windows\executors\taide 目錄中
  4. 執行 C:\kuwa\GenAI OS\windows\executors\taide\init.bat,使用以下設定值
    • Enter the option number (1-5): 3
    • Enter the model name: Llama-3.1 TAIDE LX-8B Chat Q4_K_M
    • Enter the access code: llama-3.1-taide-lx-8b-chat-q4_k_m
    • Arguments to use (...): --stop "<|eot_id|>"
  5. 重新啟動 Kuwa 即可看到新版的 TAIDE 模型

Docker 版

  1. 下載 Kuwa v0.3.4 原始碼
  2. 參考安裝文件安裝 Kuwa,也可以參考社群貢獻的手冊
  3. tetf/Llama-3.1-TAIDE-LX-8B-Chat-GGUF下載 Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf 到任意目錄中
  4. 新增 genai-os/docker/compose/sample/taide-llamacpp.yaml設定檔,填入以下內容
services:
llamacpp-executor:
  image: kuwaai/model-executor
  environment:
    EXECUTOR_TYPE: llamacpp
    EXECUTOR_ACCESS_CODE: llama-3.1-taide-lx-8b-chat-q4_k_m
    EXECUTOR_NAME: Llama-3.1 TAIDE LX-8B Chat Q4_K_M
    EXECUTOR_IMAGE: llamacpp.png # Refer to src/multi-chat/public/images
  depends_on:
    - executor-builder
    - kernel
    - multi-chat
  command: ["--model_path", "/var/model/Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf", "--temperature", "0"]
  # or use GPU
  # command: ["--model_path", "/var/model/Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf", "--ngl", "-1", "--temperature", "0"]
  restart: unless-stopped
  volumes: ["/path/to/Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf:/var/model/Llama-3.1-TAIDE-LX-8B-Chat-Q4_K_M.gguf"] # Remember to change path
  # Uncomment to use GPU
  # deploy:
  #   resources:
  #     reservations:
  #       devices:
  #       - driver: nvidia
  #         device_ids: ['0']
  #         capabilities: [gpu]
  networks: ["backend"]
  1. genai-os/docker 目錄下執行 ./run.sh 即可啟動新模型

如何量化

關於量化Llama-3.1-TAIDE-LX-8B-Chat的方法與挑戰可以參考San-Li Hsu的筆記"使用llama.cpp將Hugging Face模型權重(safetensors)轉換成GGUF並進行量化"

Downloads last month
1,240
GGUF
Model size
8.52B params
Architecture
llama

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for tetf/Llama-3.1-TAIDE-LX-8B-Chat-GGUF

Quantized
(2)
this model

Collection including tetf/Llama-3.1-TAIDE-LX-8B-Chat-GGUF