Init commit

Files changed (6) hide show

README.md CHANGED Viewed

@@ -2,4 +2,37 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
 ---
+## 模型介绍
+这个版本是基于mistral-large-instruct-2407模型，经过特殊处理的中文sft版。与原始的instruct版类似，模型对中文内容和emoji表情的处理更加亲和，确保问答性能与用户体验的优化。
+特点: 优化了对中文和emoji表情的处理能力，不影响原有instruct版模型的能力。实测表明，这个中文sft版在问答性能上领先于llama3_1-405B 中文模型
+![demo](./images/demo.png)
+![demo1](./images/demo1.png)
+## 训练细节
+-  Lora rank128, alpha256
+![detail](./images/detail.png)
+## 模型下载
+通过Git LFS克隆模型：
+```shell
+git lfs install
+git clone https://huggingface.co/opencsg/CSG-Wukong-Chinese-Mistral-Large2-123B
+```
+## Lora参数合并指南
+实现lora参数的合并，需要使用以下python代码：
+```python
+from transformers import AutoModelForCausalLM
+from peft import PeftModel
+base_model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-Large-Instruct-2407")
+peft_model_id = "opencsg/CSG-Wukong-Chinese-Mistral-Large2-123B"
+model = PeftModel.from_pretrained(base_model, peft_model_id)
+model.merge_and_unload()
+```

adapter_config.json ADDED Viewed

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "/data/models/mistralai/Mistral-Large-Instruct-2407",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [],
+  "peft_type": "LORA",
+  "r": 128,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "v_proj",
+    "k_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:6f9db1b2db0d219a416d5f20294604e30ca6af3674c8e64d4755b3e2e9de26b5
+size 1153507360

images/demo.png ADDED Viewed

images/demo1.png ADDED Viewed

images/detail.png ADDED Viewed