### llama 65B ggml model weight running alpaca.cpp ### make 65B ggml story #### 1. clone 65B model data ```shell git clone https://huggingface.co./datasets/nyanko7/LLaMA-65B/ ``` #### 2. clone alpaca.cpp ```shell git clone https://github.com/antimatter15/alpaca.cpp ``` #### 3. weight quantize.sh ```shell mv LLaMA-65B/tokenizer.model ./ python convert-pth-to-ggml.py ../LLaMA-65B/ 1 cd alpaca.cpp mkdir -p models/65B mv ../LLaMA-65B/ggml-model-f16.bin models/65B/ mv ../LLaMA-65B/ggml-model-f16.bin.* models/65B/ bash quantize.sh 65B ``` #### 4. upload weight file ##### Upload is slower. The upload is taking almost 2 days, I decided to curve the upload ##### I using https://tmp.link/ as temp store ##### I using colab and huggingface api upload ### run ```shell git clone https://github.com/antimatter15/ ./chat -m alpaca.cpp_65b_ggml/ggml-model-q4_0.bin ```