Upload 16 files
Browse files- .gitattributes +2 -0
- README.md +219 -0
- config.json +37 -0
- generation_config.json +6 -0
- huggingface-metadata.txt +16 -0
- measurement.json +0 -0
- model.safetensors.index.json +370 -0
- output-00001-of-00005.safetensors +3 -0
- output-00002-of-00005.safetensors +3 -0
- output-00003-of-00005.safetensors +3 -0
- output-00004-of-00005.safetensors +3 -0
- output-00005-of-00005.safetensors +3 -0
- params.json +12 -0
- special_tokens_map.json +0 -0
- tekken.json +3 -0
- tokenizer.json +3 -0
- tokenizer_config.json +0 -0
.gitattributes
CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
tekken.json filter=lfs diff=lfs merge=lfs -text
|
37 |
+
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
@@ -0,0 +1,219 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
- fr
|
5 |
+
- de
|
6 |
+
- es
|
7 |
+
- it
|
8 |
+
- pt
|
9 |
+
- zh
|
10 |
+
- ja
|
11 |
+
- ru
|
12 |
+
- ko
|
13 |
+
license: apache-2.0
|
14 |
+
base_model:
|
15 |
+
- mistralai/Mistral-Small-24B-Instruct-2501
|
16 |
+
quantized_by: DeusImperator
|
17 |
+
---
|
18 |
+
|
19 |
+
# Mistral-Small-24B-Instruct-2501 - EXL2 6.5bpw L
|
20 |
+
|
21 |
+
This is a 6.5bpw EXL2 quant of [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501)
|
22 |
+
|
23 |
+
This quant was made using exllamav2-0.2.7 with default dataset and extended quantization sample length (8k instead of default 2k). It also uses -head_bits=8 and max accuracy quant for first and last layer (8bpw), all other layers of the model use normally chosen methods (method and name (6.5bpw_L) inspired by quants like Q4_K_L and Q6_K_L made by [bartowski](https://huggingface.co/bartowski))
|
24 |
+
|
25 |
+
It fits nicely in 24GB VRAM on Windows with 20k fp16 context (should fit all 32k that with q8 cache in exl2).
|
26 |
+
|
27 |
+
## Prompt Templates
|
28 |
+
|
29 |
+
Uses Mistral V7-tekken:
|
30 |
+
```
|
31 |
+
<s>[SYSTEM_PROMPT]<system prompt>[/SYSTEM_PROMPT][INST]<user message>[/INST]<assistant response></s>[INST]<user message>[/INST]
|
32 |
+
```
|
33 |
+
|
34 |
+
### Original readme below
|
35 |
+
|
36 |
+
---
|
37 |
+
|
38 |
+
# Model Card for Mistral-Small-24B-Instruct-2501
|
39 |
+
|
40 |
+
Mistral Small 3 ( 2501 ) sets a new benchmark in the "small" Large Language Models category below 70B, boasting 24B parameters and achieving state-of-the-art capabilities comparable to larger models!
|
41 |
+
This model is an instruction-fine-tuned version of the base model: [Mistral-Small-24B-Base-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Base-2501).
|
42 |
+
|
43 |
+
Mistral Small can be deployed locally and is exceptionally "knowledge-dense", fitting in a single RTX 4090 or a 32GB RAM MacBook once quantized.
|
44 |
+
Perfect for:
|
45 |
+
- Fast response conversational agents.
|
46 |
+
- Low latency function calling.
|
47 |
+
- Subject matter experts via fine-tuning.
|
48 |
+
- Local inference for hobbyists and organizations handling sensitive data.
|
49 |
+
|
50 |
+
For enterprises that need specialized capabilities (increased context, particular modalities, domain specific knowledge, etc.), we will be releasing commercial models beyond what Mistral AI contributes to the community.
|
51 |
+
|
52 |
+
This release demonstrates our commitment to open source, serving as a strong base model.
|
53 |
+
|
54 |
+
Learn more about Mistral Small in our [blog post](https://mistral.ai/news/mistral-small-3/).
|
55 |
+
|
56 |
+
## Key Features
|
57 |
+
- **Multilingual:** Supports dozens of languages, including English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish.
|
58 |
+
- **Agent-Centric:** Offers best-in-class agentic capabilities with native function calling and JSON outputting.
|
59 |
+
- **Advanced Reasoning:** State-of-the-art conversational and reasoning capabilities.
|
60 |
+
- **Apache 2.0 License:** Open license allowing usage and modification for both commercial and non-commercial purposes.
|
61 |
+
- **Context Window:** A 32k context window.
|
62 |
+
- **System Prompt:** Maintains strong adherence and support for system prompts.
|
63 |
+
- **Tokenizer:** Utilizes a Tekken tokenizer with a 131k vocabulary size.
|
64 |
+
|
65 |
+
### Basic Instruct Template (V7-Tekken)
|
66 |
+
|
67 |
+
```
|
68 |
+
<s>[SYSTEM_PROMPT]<system prompt>[/SYSTEM_PROMPT][INST]<user message>[/INST]<assistant response></s>[INST]<user message>[/INST]
|
69 |
+
```
|
70 |
+
*`<system_prompt>`, `<user message>` and `<assistant response>` are placeholders.*
|
71 |
+
|
72 |
+
***Please make sure to use [mistral-common](https://github.com/mistralai/mistral-common) as the source of truth***
|
73 |
+
|
74 |
+
## Usage
|
75 |
+
|
76 |
+
The model can be used with the following frameworks;
|
77 |
+
- [`vllm`](https://github.com/vllm-project/vllm): See [here](#vLLM)
|
78 |
+
- [`transformers`](https://github.com/huggingface/transformers): See [here](#Transformers)
|
79 |
+
|
80 |
+
### vLLM
|
81 |
+
|
82 |
+
We recommend using this model with the [vLLM library](https://github.com/vllm-project/vllm)
|
83 |
+
to implement production-ready inference pipelines.
|
84 |
+
|
85 |
+
**_Installation_**
|
86 |
+
|
87 |
+
Make sure you install [`vLLM >= 0.6.4`](https://github.com/vllm-project/vllm/releases/tag/v0.6.4):
|
88 |
+
|
89 |
+
```
|
90 |
+
pip install --upgrade vllm
|
91 |
+
```
|
92 |
+
|
93 |
+
Also make sure you have [`mistral_common >= 1.5.2`](https://github.com/mistralai/mistral-common/releases/tag/v1.5.2) installed:
|
94 |
+
|
95 |
+
```
|
96 |
+
pip install --upgrade mistral_common
|
97 |
+
```
|
98 |
+
|
99 |
+
You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39).
|
100 |
+
|
101 |
+
#### Server
|
102 |
+
|
103 |
+
We recommand that you use Mistral-Small-Instruct-2501 in a server/client setting.
|
104 |
+
|
105 |
+
1. Spin up a server:
|
106 |
+
|
107 |
+
```
|
108 |
+
vllm serve mistralai/Mistral-Small-24B-Instruct-2501 --tokenizer_mode mistral --config_format mistral --load_format mistral --enable-auto-tool-choice
|
109 |
+
```
|
110 |
+
|
111 |
+
**Note:** Running Mistral-Small-Instruct-2501 on GPU requires 60 GB of GPU RAM.
|
112 |
+
|
113 |
+
|
114 |
+
2. To ping the client you can use a simple Python snippet.
|
115 |
+
|
116 |
+
```py
|
117 |
+
import requests
|
118 |
+
import json
|
119 |
+
from datetime import datetime, timedelta
|
120 |
+
|
121 |
+
url = "http://<your-server>:8000/v1/chat/completions"
|
122 |
+
headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}
|
123 |
+
|
124 |
+
model = "mistralai/Mistral-Small-24B-Instruct-2501"
|
125 |
+
|
126 |
+
messages = [
|
127 |
+
{
|
128 |
+
"role": "system",
|
129 |
+
"content": "You are a conversational agent that always answers straight to the point, always end your accurate response with an ASCII drawing of a cat."
|
130 |
+
},
|
131 |
+
{
|
132 |
+
"role": "user",
|
133 |
+
"content": "Give me 5 non-formal ways to say 'See you later' in French."
|
134 |
+
},
|
135 |
+
]
|
136 |
+
|
137 |
+
data = {"model": model, "messages": messages}
|
138 |
+
|
139 |
+
response = requests.post(url, headers=headers, data=json.dumps(data))
|
140 |
+
print(response.json()["choices"][0]["message"]["content"])
|
141 |
+
|
142 |
+
# Sure, here are five non-formal ways to say "See you later" in French:
|
143 |
+
#
|
144 |
+
# 1. À plus tard
|
145 |
+
# 2. À plus
|
146 |
+
# 3. Salut
|
147 |
+
# 4. À toute
|
148 |
+
# 5. Bisous
|
149 |
+
#
|
150 |
+
# ```
|
151 |
+
# /\_/\
|
152 |
+
# ( o.o )
|
153 |
+
# > ^ <
|
154 |
+
# ```
|
155 |
+
```
|
156 |
+
|
157 |
+
#### Offline
|
158 |
+
|
159 |
+
```py
|
160 |
+
from vllm import LLM
|
161 |
+
from vllm.sampling_params import SamplingParams
|
162 |
+
from datetime import datetime, timedelta
|
163 |
+
|
164 |
+
SYSTEM_PROMPT = "You are a conversational agent that always answers straight to the point, always end your accurate response with an ASCII drawing of a cat."
|
165 |
+
|
166 |
+
user_prompt = "Give me 5 non-formal ways to say 'See you later' in French."
|
167 |
+
|
168 |
+
messages = [
|
169 |
+
{
|
170 |
+
"role": "system",
|
171 |
+
"content": SYSTEM_PROMPT
|
172 |
+
},
|
173 |
+
{
|
174 |
+
"role": "user",
|
175 |
+
"content": user_prompt
|
176 |
+
},
|
177 |
+
]
|
178 |
+
|
179 |
+
# note that running this model on GPU requires over 60 GB of GPU RAM
|
180 |
+
llm = LLM(model=model_name, tokenizer_mode="mistral", tensor_parallel_size=8)
|
181 |
+
|
182 |
+
sampling_params = SamplingParams(max_tokens=512)
|
183 |
+
|
184 |
+
outputs = llm.chat(messages, sampling_params=sampling_params)
|
185 |
+
|
186 |
+
print(outputs[0].outputs[0].text)
|
187 |
+
# Sure, here are five non-formal ways to say "See you later" in French:
|
188 |
+
#
|
189 |
+
# 1. À plus tard
|
190 |
+
# 2. À plus
|
191 |
+
# 3. Salut
|
192 |
+
# 4. À toute
|
193 |
+
# 5. Bisous
|
194 |
+
#
|
195 |
+
# ```
|
196 |
+
# /\_/\
|
197 |
+
# ( o.o )
|
198 |
+
# > ^ <
|
199 |
+
# ```
|
200 |
+
```
|
201 |
+
|
202 |
+
### Transformers
|
203 |
+
|
204 |
+
If you want to use Hugging Face transformers to generate text, you can do something like this.
|
205 |
+
|
206 |
+
```py
|
207 |
+
from transformers import pipeline
|
208 |
+
|
209 |
+
messages = [
|
210 |
+
{"role": "system", "content": "You are a conversational agent that always answers straight to the point, always end your accurate response with an ASCII drawing of a cat."},
|
211 |
+
{"role": "user", "content": "Give me 5 non-formal ways to say 'See you later' in French."},
|
212 |
+
]
|
213 |
+
chatbot = pipeline("text-generation", model="mistralai/Mistral-Small-24B-Instruct-2501", max_new_tokens=256)
|
214 |
+
chatbot(messages)
|
215 |
+
```
|
216 |
+
|
217 |
+
## The Mistral AI Team
|
218 |
+
|
219 |
+
Albert Jiang, Alexandre Sablayrolles, Alexis Tacnet, Alok Kothari, Antoine Roux, Arthur Mensch, Audrey Herblin-Stoop, Augustin Garreau, Austin Birky, Bam4d, Baptiste Bout, Baudouin de Monicault, Blanche Savary, Carole Rambaud, Caroline Feldman, Devendra Singh Chaplot, Diego de las Casas, Diogo Costa, Eleonore Arcelin, Emma Bou Hanna, Etienne Metzger, Gaspard Blanchet, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Harizo Rajaona, Henri Roussez, Hichem Sattouf, Ian Mack, Jean-Malo Delignon, Jessica Chudnovsky, Justus Murke, Kartik Khandelwal, Lawrence Stewart, Louis Martin, Louis Ternon, Lucile Saulnier, Lélio Renard Lavaud, Margaret Jennings, Marie Pellat, Marie Torelli, Marie-Anne Lachaux, Marjorie Janiewicz, Mickaël Seznec, Nicolas Schuhl, Niklas Muhs, Olivier de Garrigues, Patrick von Platen, Paul Jacob, Pauline Buche, Pavan Kumar Reddy, Perry Savas, Pierre Stock, Romain Sauvestre, Sagar Vaze, Sandeep Subramanian, Saurabh Garg, Sophia Yang, Szymon Antoniak, Teven Le Scao, Thibault Schueller, Thibaut Lavril, Thomas Wang, Théophile Gervet, Timothée Lacroix, Valera Nemychnikova, Wendy Shang, William El Sayed, William Marshall
|
config.json
ADDED
@@ -0,0 +1,37 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"architectures": [
|
3 |
+
"MistralForCausalLM"
|
4 |
+
],
|
5 |
+
"attention_dropout": 0.0,
|
6 |
+
"bos_token_id": 1,
|
7 |
+
"eos_token_id": 2,
|
8 |
+
"head_dim": 128,
|
9 |
+
"hidden_act": "silu",
|
10 |
+
"hidden_size": 5120,
|
11 |
+
"initializer_range": 0.02,
|
12 |
+
"intermediate_size": 32768,
|
13 |
+
"max_position_embeddings": 32768,
|
14 |
+
"model_type": "mistral",
|
15 |
+
"num_attention_heads": 32,
|
16 |
+
"num_hidden_layers": 40,
|
17 |
+
"num_key_value_heads": 8,
|
18 |
+
"rms_norm_eps": 1e-05,
|
19 |
+
"rope_theta": 100000000.0,
|
20 |
+
"sliding_window": null,
|
21 |
+
"tie_word_embeddings": false,
|
22 |
+
"torch_dtype": "bfloat16",
|
23 |
+
"transformers_version": "4.49.0.dev0",
|
24 |
+
"use_cache": true,
|
25 |
+
"vocab_size": 131072,
|
26 |
+
"quantization_config": {
|
27 |
+
"quant_method": "exl2",
|
28 |
+
"version": "0.2.7",
|
29 |
+
"bits": 6.5,
|
30 |
+
"head_bits": 8,
|
31 |
+
"calibration": {
|
32 |
+
"rows": 115,
|
33 |
+
"length": 8192,
|
34 |
+
"dataset": "(default)"
|
35 |
+
}
|
36 |
+
}
|
37 |
+
}
|
generation_config.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_from_model_config": true,
|
3 |
+
"bos_token_id": 1,
|
4 |
+
"eos_token_id": 2,
|
5 |
+
"transformers_version": "4.49.0.dev0"
|
6 |
+
}
|
huggingface-metadata.txt
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
url: https://huggingface.co/adamo1139/Mistral-Small-24B-Instruct-2501-ungated
|
2 |
+
branch: main
|
3 |
+
download date: 2025-01-30 16:28:49
|
4 |
+
sha256sum:
|
5 |
+
75a14c708eea501700a723dc74bc886cf36a1393686a3fb098ee106b160da32f model-00001-of-00010.safetensors
|
6 |
+
1ff40fbfd9e042b7dab3f3c9442f870a4701f53e394dda769807a160ba40f32a model-00002-of-00010.safetensors
|
7 |
+
4cc2d059fded71efd2947a414f32053b4ed3fa84383edf97b6d91fd9f04e4235 model-00003-of-00010.safetensors
|
8 |
+
aa0e9acacf161c45ae0d71ca3f7e4ec9ee55dae2153398da52f81ee4f9e1b8d2 model-00004-of-00010.safetensors
|
9 |
+
dafb696763d31a1fda58010b73ecc05c19d395da8ec2c24aa9c41da33f2230d3 model-00005-of-00010.safetensors
|
10 |
+
d9a433b19fd4d6986660a616a0d6fc7d02d9e8c0ab3c9b98940217ee6bd4e053 model-00006-of-00010.safetensors
|
11 |
+
5ac5c7b042491f917016c3e9635583177058f736be5fa315019b959fc3c43b63 model-00007-of-00010.safetensors
|
12 |
+
3c460c5b957ab3ac81f03bb20c0348820225dd9b819fa4487ae733b1e696e573 model-00008-of-00010.safetensors
|
13 |
+
304f66de1aeb55f0b4e1181885e3d15b65b485d5ce5c93b4adcdf7dd2c2d8cc5 model-00009-of-00010.safetensors
|
14 |
+
c4c3bcedc02f4dd7e04c8b0fe1199f4f27de7a37790d1510a8772ffe05093543 model-00010-of-00010.safetensors
|
15 |
+
c4b90a968dbc67ef3975129d0b78a2e3cbb6bea340ab9205f22e8a0308b1ffc5 tekken.json
|
16 |
+
b76085f9923309d873994d444989f7eb6ec074b06f25b58f1e8d7b7741070949 tokenizer.json
|
measurement.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
model.safetensors.index.json
ADDED
@@ -0,0 +1,370 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"metadata": {
|
3 |
+
"total_size": 47144806400
|
4 |
+
},
|
5 |
+
"weight_map": {
|
6 |
+
"lm_head.weight": "model-00010-of-00010.safetensors",
|
7 |
+
"model.embed_tokens.weight": "model-00001-of-00010.safetensors",
|
8 |
+
"model.layers.0.input_layernorm.weight": "model-00001-of-00010.safetensors",
|
9 |
+
"model.layers.0.mlp.down_proj.weight": "model-00001-of-00010.safetensors",
|
10 |
+
"model.layers.0.mlp.gate_proj.weight": "model-00001-of-00010.safetensors",
|
11 |
+
"model.layers.0.mlp.up_proj.weight": "model-00001-of-00010.safetensors",
|
12 |
+
"model.layers.0.post_attention_layernorm.weight": "model-00001-of-00010.safetensors",
|
13 |
+
"model.layers.0.self_attn.k_proj.weight": "model-00001-of-00010.safetensors",
|
14 |
+
"model.layers.0.self_attn.o_proj.weight": "model-00001-of-00010.safetensors",
|
15 |
+
"model.layers.0.self_attn.q_proj.weight": "model-00001-of-00010.safetensors",
|
16 |
+
"model.layers.0.self_attn.v_proj.weight": "model-00001-of-00010.safetensors",
|
17 |
+
"model.layers.1.input_layernorm.weight": "model-00001-of-00010.safetensors",
|
18 |
+
"model.layers.1.mlp.down_proj.weight": "model-00001-of-00010.safetensors",
|
19 |
+
"model.layers.1.mlp.gate_proj.weight": "model-00001-of-00010.safetensors",
|
20 |
+
"model.layers.1.mlp.up_proj.weight": "model-00001-of-00010.safetensors",
|
21 |
+
"model.layers.1.post_attention_layernorm.weight": "model-00001-of-00010.safetensors",
|
22 |
+
"model.layers.1.self_attn.k_proj.weight": "model-00001-of-00010.safetensors",
|
23 |
+
"model.layers.1.self_attn.o_proj.weight": "model-00001-of-00010.safetensors",
|
24 |
+
"model.layers.1.self_attn.q_proj.weight": "model-00001-of-00010.safetensors",
|
25 |
+
"model.layers.1.self_attn.v_proj.weight": "model-00001-of-00010.safetensors",
|
26 |
+
"model.layers.10.input_layernorm.weight": "model-00003-of-00010.safetensors",
|
27 |
+
"model.layers.10.mlp.down_proj.weight": "model-00003-of-00010.safetensors",
|
28 |
+
"model.layers.10.mlp.gate_proj.weight": "model-00003-of-00010.safetensors",
|
29 |
+
"model.layers.10.mlp.up_proj.weight": "model-00003-of-00010.safetensors",
|
30 |
+
"model.layers.10.post_attention_layernorm.weight": "model-00003-of-00010.safetensors",
|
31 |
+
"model.layers.10.self_attn.k_proj.weight": "model-00003-of-00010.safetensors",
|
32 |
+
"model.layers.10.self_attn.o_proj.weight": "model-00003-of-00010.safetensors",
|
33 |
+
"model.layers.10.self_attn.q_proj.weight": "model-00003-of-00010.safetensors",
|
34 |
+
"model.layers.10.self_attn.v_proj.weight": "model-00003-of-00010.safetensors",
|
35 |
+
"model.layers.11.input_layernorm.weight": "model-00004-of-00010.safetensors",
|
36 |
+
"model.layers.11.mlp.down_proj.weight": "model-00004-of-00010.safetensors",
|
37 |
+
"model.layers.11.mlp.gate_proj.weight": "model-00003-of-00010.safetensors",
|
38 |
+
"model.layers.11.mlp.up_proj.weight": "model-00003-of-00010.safetensors",
|
39 |
+
"model.layers.11.post_attention_layernorm.weight": "model-00004-of-00010.safetensors",
|
40 |
+
"model.layers.11.self_attn.k_proj.weight": "model-00003-of-00010.safetensors",
|
41 |
+
"model.layers.11.self_attn.o_proj.weight": "model-00003-of-00010.safetensors",
|
42 |
+
"model.layers.11.self_attn.q_proj.weight": "model-00003-of-00010.safetensors",
|
43 |
+
"model.layers.11.self_attn.v_proj.weight": "model-00003-of-00010.safetensors",
|
44 |
+
"model.layers.12.input_layernorm.weight": "model-00004-of-00010.safetensors",
|
45 |
+
"model.layers.12.mlp.down_proj.weight": "model-00004-of-00010.safetensors",
|
46 |
+
"model.layers.12.mlp.gate_proj.weight": "model-00004-of-00010.safetensors",
|
47 |
+
"model.layers.12.mlp.up_proj.weight": "model-00004-of-00010.safetensors",
|
48 |
+
"model.layers.12.post_attention_layernorm.weight": "model-00004-of-00010.safetensors",
|
49 |
+
"model.layers.12.self_attn.k_proj.weight": "model-00004-of-00010.safetensors",
|
50 |
+
"model.layers.12.self_attn.o_proj.weight": "model-00004-of-00010.safetensors",
|
51 |
+
"model.layers.12.self_attn.q_proj.weight": "model-00004-of-00010.safetensors",
|
52 |
+
"model.layers.12.self_attn.v_proj.weight": "model-00004-of-00010.safetensors",
|
53 |
+
"model.layers.13.input_layernorm.weight": "model-00004-of-00010.safetensors",
|
54 |
+
"model.layers.13.mlp.down_proj.weight": "model-00004-of-00010.safetensors",
|
55 |
+
"model.layers.13.mlp.gate_proj.weight": "model-00004-of-00010.safetensors",
|
56 |
+
"model.layers.13.mlp.up_proj.weight": "model-00004-of-00010.safetensors",
|
57 |
+
"model.layers.13.post_attention_layernorm.weight": "model-00004-of-00010.safetensors",
|
58 |
+
"model.layers.13.self_attn.k_proj.weight": "model-00004-of-00010.safetensors",
|
59 |
+
"model.layers.13.self_attn.o_proj.weight": "model-00004-of-00010.safetensors",
|
60 |
+
"model.layers.13.self_attn.q_proj.weight": "model-00004-of-00010.safetensors",
|
61 |
+
"model.layers.13.self_attn.v_proj.weight": "model-00004-of-00010.safetensors",
|
62 |
+
"model.layers.14.input_layernorm.weight": "model-00004-of-00010.safetensors",
|
63 |
+
"model.layers.14.mlp.down_proj.weight": "model-00004-of-00010.safetensors",
|
64 |
+
"model.layers.14.mlp.gate_proj.weight": "model-00004-of-00010.safetensors",
|
65 |
+
"model.layers.14.mlp.up_proj.weight": "model-00004-of-00010.safetensors",
|
66 |
+
"model.layers.14.post_attention_layernorm.weight": "model-00004-of-00010.safetensors",
|
67 |
+
"model.layers.14.self_attn.k_proj.weight": "model-00004-of-00010.safetensors",
|
68 |
+
"model.layers.14.self_attn.o_proj.weight": "model-00004-of-00010.safetensors",
|
69 |
+
"model.layers.14.self_attn.q_proj.weight": "model-00004-of-00010.safetensors",
|
70 |
+
"model.layers.14.self_attn.v_proj.weight": "model-00004-of-00010.safetensors",
|
71 |
+
"model.layers.15.input_layernorm.weight": "model-00004-of-00010.safetensors",
|
72 |
+
"model.layers.15.mlp.down_proj.weight": "model-00004-of-00010.safetensors",
|
73 |
+
"model.layers.15.mlp.gate_proj.weight": "model-00004-of-00010.safetensors",
|
74 |
+
"model.layers.15.mlp.up_proj.weight": "model-00004-of-00010.safetensors",
|
75 |
+
"model.layers.15.post_attention_layernorm.weight": "model-00004-of-00010.safetensors",
|
76 |
+
"model.layers.15.self_attn.k_proj.weight": "model-00004-of-00010.safetensors",
|
77 |
+
"model.layers.15.self_attn.o_proj.weight": "model-00004-of-00010.safetensors",
|
78 |
+
"model.layers.15.self_attn.q_proj.weight": "model-00004-of-00010.safetensors",
|
79 |
+
"model.layers.15.self_attn.v_proj.weight": "model-00004-of-00010.safetensors",
|
80 |
+
"model.layers.16.input_layernorm.weight": "model-00005-of-00010.safetensors",
|
81 |
+
"model.layers.16.mlp.down_proj.weight": "model-00005-of-00010.safetensors",
|
82 |
+
"model.layers.16.mlp.gate_proj.weight": "model-00005-of-00010.safetensors",
|
83 |
+
"model.layers.16.mlp.up_proj.weight": "model-00005-of-00010.safetensors",
|
84 |
+
"model.layers.16.post_attention_layernorm.weight": "model-00005-of-00010.safetensors",
|
85 |
+
"model.layers.16.self_attn.k_proj.weight": "model-00004-of-00010.safetensors",
|
86 |
+
"model.layers.16.self_attn.o_proj.weight": "model-00004-of-00010.safetensors",
|
87 |
+
"model.layers.16.self_attn.q_proj.weight": "model-00004-of-00010.safetensors",
|
88 |
+
"model.layers.16.self_attn.v_proj.weight": "model-00004-of-00010.safetensors",
|
89 |
+
"model.layers.17.input_layernorm.weight": "model-00005-of-00010.safetensors",
|
90 |
+
"model.layers.17.mlp.down_proj.weight": "model-00005-of-00010.safetensors",
|
91 |
+
"model.layers.17.mlp.gate_proj.weight": "model-00005-of-00010.safetensors",
|
92 |
+
"model.layers.17.mlp.up_proj.weight": "model-00005-of-00010.safetensors",
|
93 |
+
"model.layers.17.post_attention_layernorm.weight": "model-00005-of-00010.safetensors",
|
94 |
+
"model.layers.17.self_attn.k_proj.weight": "model-00005-of-00010.safetensors",
|
95 |
+
"model.layers.17.self_attn.o_proj.weight": "model-00005-of-00010.safetensors",
|
96 |
+
"model.layers.17.self_attn.q_proj.weight": "model-00005-of-00010.safetensors",
|
97 |
+
"model.layers.17.self_attn.v_proj.weight": "model-00005-of-00010.safetensors",
|
98 |
+
"model.layers.18.input_layernorm.weight": "model-00005-of-00010.safetensors",
|
99 |
+
"model.layers.18.mlp.down_proj.weight": "model-00005-of-00010.safetensors",
|
100 |
+
"model.layers.18.mlp.gate_proj.weight": "model-00005-of-00010.safetensors",
|
101 |
+
"model.layers.18.mlp.up_proj.weight": "model-00005-of-00010.safetensors",
|
102 |
+
"model.layers.18.post_attention_layernorm.weight": "model-00005-of-00010.safetensors",
|
103 |
+
"model.layers.18.self_attn.k_proj.weight": "model-00005-of-00010.safetensors",
|
104 |
+
"model.layers.18.self_attn.o_proj.weight": "model-00005-of-00010.safetensors",
|
105 |
+
"model.layers.18.self_attn.q_proj.weight": "model-00005-of-00010.safetensors",
|
106 |
+
"model.layers.18.self_attn.v_proj.weight": "model-00005-of-00010.safetensors",
|
107 |
+
"model.layers.19.input_layernorm.weight": "model-00005-of-00010.safetensors",
|
108 |
+
"model.layers.19.mlp.down_proj.weight": "model-00005-of-00010.safetensors",
|
109 |
+
"model.layers.19.mlp.gate_proj.weight": "model-00005-of-00010.safetensors",
|
110 |
+
"model.layers.19.mlp.up_proj.weight": "model-00005-of-00010.safetensors",
|
111 |
+
"model.layers.19.post_attention_layernorm.weight": "model-00005-of-00010.safetensors",
|
112 |
+
"model.layers.19.self_attn.k_proj.weight": "model-00005-of-00010.safetensors",
|
113 |
+
"model.layers.19.self_attn.o_proj.weight": "model-00005-of-00010.safetensors",
|
114 |
+
"model.layers.19.self_attn.q_proj.weight": "model-00005-of-00010.safetensors",
|
115 |
+
"model.layers.19.self_attn.v_proj.weight": "model-00005-of-00010.safetensors",
|
116 |
+
"model.layers.2.input_layernorm.weight": "model-00001-of-00010.safetensors",
|
117 |
+
"model.layers.2.mlp.down_proj.weight": "model-00001-of-00010.safetensors",
|
118 |
+
"model.layers.2.mlp.gate_proj.weight": "model-00001-of-00010.safetensors",
|
119 |
+
"model.layers.2.mlp.up_proj.weight": "model-00001-of-00010.safetensors",
|
120 |
+
"model.layers.2.post_attention_layernorm.weight": "model-00001-of-00010.safetensors",
|
121 |
+
"model.layers.2.self_attn.k_proj.weight": "model-00001-of-00010.safetensors",
|
122 |
+
"model.layers.2.self_attn.o_proj.weight": "model-00001-of-00010.safetensors",
|
123 |
+
"model.layers.2.self_attn.q_proj.weight": "model-00001-of-00010.safetensors",
|
124 |
+
"model.layers.2.self_attn.v_proj.weight": "model-00001-of-00010.safetensors",
|
125 |
+
"model.layers.20.input_layernorm.weight": "model-00006-of-00010.safetensors",
|
126 |
+
"model.layers.20.mlp.down_proj.weight": "model-00006-of-00010.safetensors",
|
127 |
+
"model.layers.20.mlp.gate_proj.weight": "model-00005-of-00010.safetensors",
|
128 |
+
"model.layers.20.mlp.up_proj.weight": "model-00006-of-00010.safetensors",
|
129 |
+
"model.layers.20.post_attention_layernorm.weight": "model-00006-of-00010.safetensors",
|
130 |
+
"model.layers.20.self_attn.k_proj.weight": "model-00005-of-00010.safetensors",
|
131 |
+
"model.layers.20.self_attn.o_proj.weight": "model-00005-of-00010.safetensors",
|
132 |
+
"model.layers.20.self_attn.q_proj.weight": "model-00005-of-00010.safetensors",
|
133 |
+
"model.layers.20.self_attn.v_proj.weight": "model-00005-of-00010.safetensors",
|
134 |
+
"model.layers.21.input_layernorm.weight": "model-00006-of-00010.safetensors",
|
135 |
+
"model.layers.21.mlp.down_proj.weight": "model-00006-of-00010.safetensors",
|
136 |
+
"model.layers.21.mlp.gate_proj.weight": "model-00006-of-00010.safetensors",
|
137 |
+
"model.layers.21.mlp.up_proj.weight": "model-00006-of-00010.safetensors",
|
138 |
+
"model.layers.21.post_attention_layernorm.weight": "model-00006-of-00010.safetensors",
|
139 |
+
"model.layers.21.self_attn.k_proj.weight": "model-00006-of-00010.safetensors",
|
140 |
+
"model.layers.21.self_attn.o_proj.weight": "model-00006-of-00010.safetensors",
|
141 |
+
"model.layers.21.self_attn.q_proj.weight": "model-00006-of-00010.safetensors",
|
142 |
+
"model.layers.21.self_attn.v_proj.weight": "model-00006-of-00010.safetensors",
|
143 |
+
"model.layers.22.input_layernorm.weight": "model-00006-of-00010.safetensors",
|
144 |
+
"model.layers.22.mlp.down_proj.weight": "model-00006-of-00010.safetensors",
|
145 |
+
"model.layers.22.mlp.gate_proj.weight": "model-00006-of-00010.safetensors",
|
146 |
+
"model.layers.22.mlp.up_proj.weight": "model-00006-of-00010.safetensors",
|
147 |
+
"model.layers.22.post_attention_layernorm.weight": "model-00006-of-00010.safetensors",
|
148 |
+
"model.layers.22.self_attn.k_proj.weight": "model-00006-of-00010.safetensors",
|
149 |
+
"model.layers.22.self_attn.o_proj.weight": "model-00006-of-00010.safetensors",
|
150 |
+
"model.layers.22.self_attn.q_proj.weight": "model-00006-of-00010.safetensors",
|
151 |
+
"model.layers.22.self_attn.v_proj.weight": "model-00006-of-00010.safetensors",
|
152 |
+
"model.layers.23.input_layernorm.weight": "model-00006-of-00010.safetensors",
|
153 |
+
"model.layers.23.mlp.down_proj.weight": "model-00006-of-00010.safetensors",
|
154 |
+
"model.layers.23.mlp.gate_proj.weight": "model-00006-of-00010.safetensors",
|
155 |
+
"model.layers.23.mlp.up_proj.weight": "model-00006-of-00010.safetensors",
|
156 |
+
"model.layers.23.post_attention_layernorm.weight": "model-00006-of-00010.safetensors",
|
157 |
+
"model.layers.23.self_attn.k_proj.weight": "model-00006-of-00010.safetensors",
|
158 |
+
"model.layers.23.self_attn.o_proj.weight": "model-00006-of-00010.safetensors",
|
159 |
+
"model.layers.23.self_attn.q_proj.weight": "model-00006-of-00010.safetensors",
|
160 |
+
"model.layers.23.self_attn.v_proj.weight": "model-00006-of-00010.safetensors",
|
161 |
+
"model.layers.24.input_layernorm.weight": "model-00007-of-00010.safetensors",
|
162 |
+
"model.layers.24.mlp.down_proj.weight": "model-00007-of-00010.safetensors",
|
163 |
+
"model.layers.24.mlp.gate_proj.weight": "model-00006-of-00010.safetensors",
|
164 |
+
"model.layers.24.mlp.up_proj.weight": "model-00006-of-00010.safetensors",
|
165 |
+
"model.layers.24.post_attention_layernorm.weight": "model-00007-of-00010.safetensors",
|
166 |
+
"model.layers.24.self_attn.k_proj.weight": "model-00006-of-00010.safetensors",
|
167 |
+
"model.layers.24.self_attn.o_proj.weight": "model-00006-of-00010.safetensors",
|
168 |
+
"model.layers.24.self_attn.q_proj.weight": "model-00006-of-00010.safetensors",
|
169 |
+
"model.layers.24.self_attn.v_proj.weight": "model-00006-of-00010.safetensors",
|
170 |
+
"model.layers.25.input_layernorm.weight": "model-00007-of-00010.safetensors",
|
171 |
+
"model.layers.25.mlp.down_proj.weight": "model-00007-of-00010.safetensors",
|
172 |
+
"model.layers.25.mlp.gate_proj.weight": "model-00007-of-00010.safetensors",
|
173 |
+
"model.layers.25.mlp.up_proj.weight": "model-00007-of-00010.safetensors",
|
174 |
+
"model.layers.25.post_attention_layernorm.weight": "model-00007-of-00010.safetensors",
|
175 |
+
"model.layers.25.self_attn.k_proj.weight": "model-00007-of-00010.safetensors",
|
176 |
+
"model.layers.25.self_attn.o_proj.weight": "model-00007-of-00010.safetensors",
|
177 |
+
"model.layers.25.self_attn.q_proj.weight": "model-00007-of-00010.safetensors",
|
178 |
+
"model.layers.25.self_attn.v_proj.weight": "model-00007-of-00010.safetensors",
|
179 |
+
"model.layers.26.input_layernorm.weight": "model-00007-of-00010.safetensors",
|
180 |
+
"model.layers.26.mlp.down_proj.weight": "model-00007-of-00010.safetensors",
|
181 |
+
"model.layers.26.mlp.gate_proj.weight": "model-00007-of-00010.safetensors",
|
182 |
+
"model.layers.26.mlp.up_proj.weight": "model-00007-of-00010.safetensors",
|
183 |
+
"model.layers.26.post_attention_layernorm.weight": "model-00007-of-00010.safetensors",
|
184 |
+
"model.layers.26.self_attn.k_proj.weight": "model-00007-of-00010.safetensors",
|
185 |
+
"model.layers.26.self_attn.o_proj.weight": "model-00007-of-00010.safetensors",
|
186 |
+
"model.layers.26.self_attn.q_proj.weight": "model-00007-of-00010.safetensors",
|
187 |
+
"model.layers.26.self_attn.v_proj.weight": "model-00007-of-00010.safetensors",
|
188 |
+
"model.layers.27.input_layernorm.weight": "model-00007-of-00010.safetensors",
|
189 |
+
"model.layers.27.mlp.down_proj.weight": "model-00007-of-00010.safetensors",
|
190 |
+
"model.layers.27.mlp.gate_proj.weight": "model-00007-of-00010.safetensors",
|
191 |
+
"model.layers.27.mlp.up_proj.weight": "model-00007-of-00010.safetensors",
|
192 |
+
"model.layers.27.post_attention_layernorm.weight": "model-00007-of-00010.safetensors",
|
193 |
+
"model.layers.27.self_attn.k_proj.weight": "model-00007-of-00010.safetensors",
|
194 |
+
"model.layers.27.self_attn.o_proj.weight": "model-00007-of-00010.safetensors",
|
195 |
+
"model.layers.27.self_attn.q_proj.weight": "model-00007-of-00010.safetensors",
|
196 |
+
"model.layers.27.self_attn.v_proj.weight": "model-00007-of-00010.safetensors",
|
197 |
+
"model.layers.28.input_layernorm.weight": "model-00007-of-00010.safetensors",
|
198 |
+
"model.layers.28.mlp.down_proj.weight": "model-00007-of-00010.safetensors",
|
199 |
+
"model.layers.28.mlp.gate_proj.weight": "model-00007-of-00010.safetensors",
|
200 |
+
"model.layers.28.mlp.up_proj.weight": "model-00007-of-00010.safetensors",
|
201 |
+
"model.layers.28.post_attention_layernorm.weight": "model-00007-of-00010.safetensors",
|
202 |
+
"model.layers.28.self_attn.k_proj.weight": "model-00007-of-00010.safetensors",
|
203 |
+
"model.layers.28.self_attn.o_proj.weight": "model-00007-of-00010.safetensors",
|
204 |
+
"model.layers.28.self_attn.q_proj.weight": "model-00007-of-00010.safetensors",
|
205 |
+
"model.layers.28.self_attn.v_proj.weight": "model-00007-of-00010.safetensors",
|
206 |
+
"model.layers.29.input_layernorm.weight": "model-00008-of-00010.safetensors",
|
207 |
+
"model.layers.29.mlp.down_proj.weight": "model-00008-of-00010.safetensors",
|
208 |
+
"model.layers.29.mlp.gate_proj.weight": "model-00008-of-00010.safetensors",
|
209 |
+
"model.layers.29.mlp.up_proj.weight": "model-00008-of-00010.safetensors",
|
210 |
+
"model.layers.29.post_attention_layernorm.weight": "model-00008-of-00010.safetensors",
|
211 |
+
"model.layers.29.self_attn.k_proj.weight": "model-00007-of-00010.safetensors",
|
212 |
+
"model.layers.29.self_attn.o_proj.weight": "model-00007-of-00010.safetensors",
|
213 |
+
"model.layers.29.self_attn.q_proj.weight": "model-00007-of-00010.safetensors",
|
214 |
+
"model.layers.29.self_attn.v_proj.weight": "model-00007-of-00010.safetensors",
|
215 |
+
"model.layers.3.input_layernorm.weight": "model-00002-of-00010.safetensors",
|
216 |
+
"model.layers.3.mlp.down_proj.weight": "model-00002-of-00010.safetensors",
|
217 |
+
"model.layers.3.mlp.gate_proj.weight": "model-00002-of-00010.safetensors",
|
218 |
+
"model.layers.3.mlp.up_proj.weight": "model-00002-of-00010.safetensors",
|
219 |
+
"model.layers.3.post_attention_layernorm.weight": "model-00002-of-00010.safetensors",
|
220 |
+
"model.layers.3.self_attn.k_proj.weight": "model-00001-of-00010.safetensors",
|
221 |
+
"model.layers.3.self_attn.o_proj.weight": "model-00001-of-00010.safetensors",
|
222 |
+
"model.layers.3.self_attn.q_proj.weight": "model-00001-of-00010.safetensors",
|
223 |
+
"model.layers.3.self_attn.v_proj.weight": "model-00001-of-00010.safetensors",
|
224 |
+
"model.layers.30.input_layernorm.weight": "model-00008-of-00010.safetensors",
|
225 |
+
"model.layers.30.mlp.down_proj.weight": "model-00008-of-00010.safetensors",
|
226 |
+
"model.layers.30.mlp.gate_proj.weight": "model-00008-of-00010.safetensors",
|
227 |
+
"model.layers.30.mlp.up_proj.weight": "model-00008-of-00010.safetensors",
|
228 |
+
"model.layers.30.post_attention_layernorm.weight": "model-00008-of-00010.safetensors",
|
229 |
+
"model.layers.30.self_attn.k_proj.weight": "model-00008-of-00010.safetensors",
|
230 |
+
"model.layers.30.self_attn.o_proj.weight": "model-00008-of-00010.safetensors",
|
231 |
+
"model.layers.30.self_attn.q_proj.weight": "model-00008-of-00010.safetensors",
|
232 |
+
"model.layers.30.self_attn.v_proj.weight": "model-00008-of-00010.safetensors",
|
233 |
+
"model.layers.31.input_layernorm.weight": "model-00008-of-00010.safetensors",
|
234 |
+
"model.layers.31.mlp.down_proj.weight": "model-00008-of-00010.safetensors",
|
235 |
+
"model.layers.31.mlp.gate_proj.weight": "model-00008-of-00010.safetensors",
|
236 |
+
"model.layers.31.mlp.up_proj.weight": "model-00008-of-00010.safetensors",
|
237 |
+
"model.layers.31.post_attention_layernorm.weight": "model-00008-of-00010.safetensors",
|
238 |
+
"model.layers.31.self_attn.k_proj.weight": "model-00008-of-00010.safetensors",
|
239 |
+
"model.layers.31.self_attn.o_proj.weight": "model-00008-of-00010.safetensors",
|
240 |
+
"model.layers.31.self_attn.q_proj.weight": "model-00008-of-00010.safetensors",
|
241 |
+
"model.layers.31.self_attn.v_proj.weight": "model-00008-of-00010.safetensors",
|
242 |
+
"model.layers.32.input_layernorm.weight": "model-00008-of-00010.safetensors",
|
243 |
+
"model.layers.32.mlp.down_proj.weight": "model-00008-of-00010.safetensors",
|
244 |
+
"model.layers.32.mlp.gate_proj.weight": "model-00008-of-00010.safetensors",
|
245 |
+
"model.layers.32.mlp.up_proj.weight": "model-00008-of-00010.safetensors",
|
246 |
+
"model.layers.32.post_attention_layernorm.weight": "model-00008-of-00010.safetensors",
|
247 |
+
"model.layers.32.self_attn.k_proj.weight": "model-00008-of-00010.safetensors",
|
248 |
+
"model.layers.32.self_attn.o_proj.weight": "model-00008-of-00010.safetensors",
|
249 |
+
"model.layers.32.self_attn.q_proj.weight": "model-00008-of-00010.safetensors",
|
250 |
+
"model.layers.32.self_attn.v_proj.weight": "model-00008-of-00010.safetensors",
|
251 |
+
"model.layers.33.input_layernorm.weight": "model-00009-of-00010.safetensors",
|
252 |
+
"model.layers.33.mlp.down_proj.weight": "model-00009-of-00010.safetensors",
|
253 |
+
"model.layers.33.mlp.gate_proj.weight": "model-00008-of-00010.safetensors",
|
254 |
+
"model.layers.33.mlp.up_proj.weight": "model-00009-of-00010.safetensors",
|
255 |
+
"model.layers.33.post_attention_layernorm.weight": "model-00009-of-00010.safetensors",
|
256 |
+
"model.layers.33.self_attn.k_proj.weight": "model-00008-of-00010.safetensors",
|
257 |
+
"model.layers.33.self_attn.o_proj.weight": "model-00008-of-00010.safetensors",
|
258 |
+
"model.layers.33.self_attn.q_proj.weight": "model-00008-of-00010.safetensors",
|
259 |
+
"model.layers.33.self_attn.v_proj.weight": "model-00008-of-00010.safetensors",
|
260 |
+
"model.layers.34.input_layernorm.weight": "model-00009-of-00010.safetensors",
|
261 |
+
"model.layers.34.mlp.down_proj.weight": "model-00009-of-00010.safetensors",
|
262 |
+
"model.layers.34.mlp.gate_proj.weight": "model-00009-of-00010.safetensors",
|
263 |
+
"model.layers.34.mlp.up_proj.weight": "model-00009-of-00010.safetensors",
|
264 |
+
"model.layers.34.post_attention_layernorm.weight": "model-00009-of-00010.safetensors",
|
265 |
+
"model.layers.34.self_attn.k_proj.weight": "model-00009-of-00010.safetensors",
|
266 |
+
"model.layers.34.self_attn.o_proj.weight": "model-00009-of-00010.safetensors",
|
267 |
+
"model.layers.34.self_attn.q_proj.weight": "model-00009-of-00010.safetensors",
|
268 |
+
"model.layers.34.self_attn.v_proj.weight": "model-00009-of-00010.safetensors",
|
269 |
+
"model.layers.35.input_layernorm.weight": "model-00009-of-00010.safetensors",
|
270 |
+
"model.layers.35.mlp.down_proj.weight": "model-00009-of-00010.safetensors",
|
271 |
+
"model.layers.35.mlp.gate_proj.weight": "model-00009-of-00010.safetensors",
|
272 |
+
"model.layers.35.mlp.up_proj.weight": "model-00009-of-00010.safetensors",
|
273 |
+
"model.layers.35.post_attention_layernorm.weight": "model-00009-of-00010.safetensors",
|
274 |
+
"model.layers.35.self_attn.k_proj.weight": "model-00009-of-00010.safetensors",
|
275 |
+
"model.layers.35.self_attn.o_proj.weight": "model-00009-of-00010.safetensors",
|
276 |
+
"model.layers.35.self_attn.q_proj.weight": "model-00009-of-00010.safetensors",
|
277 |
+
"model.layers.35.self_attn.v_proj.weight": "model-00009-of-00010.safetensors",
|
278 |
+
"model.layers.36.input_layernorm.weight": "model-00009-of-00010.safetensors",
|
279 |
+
"model.layers.36.mlp.down_proj.weight": "model-00009-of-00010.safetensors",
|
280 |
+
"model.layers.36.mlp.gate_proj.weight": "model-00009-of-00010.safetensors",
|
281 |
+
"model.layers.36.mlp.up_proj.weight": "model-00009-of-00010.safetensors",
|
282 |
+
"model.layers.36.post_attention_layernorm.weight": "model-00009-of-00010.safetensors",
|
283 |
+
"model.layers.36.self_attn.k_proj.weight": "model-00009-of-00010.safetensors",
|
284 |
+
"model.layers.36.self_attn.o_proj.weight": "model-00009-of-00010.safetensors",
|
285 |
+
"model.layers.36.self_attn.q_proj.weight": "model-00009-of-00010.safetensors",
|
286 |
+
"model.layers.36.self_attn.v_proj.weight": "model-00009-of-00010.safetensors",
|
287 |
+
"model.layers.37.input_layernorm.weight": "model-00010-of-00010.safetensors",
|
288 |
+
"model.layers.37.mlp.down_proj.weight": "model-00010-of-00010.safetensors",
|
289 |
+
"model.layers.37.mlp.gate_proj.weight": "model-00009-of-00010.safetensors",
|
290 |
+
"model.layers.37.mlp.up_proj.weight": "model-00009-of-00010.safetensors",
|
291 |
+
"model.layers.37.post_attention_layernorm.weight": "model-00010-of-00010.safetensors",
|
292 |
+
"model.layers.37.self_attn.k_proj.weight": "model-00009-of-00010.safetensors",
|
293 |
+
"model.layers.37.self_attn.o_proj.weight": "model-00009-of-00010.safetensors",
|
294 |
+
"model.layers.37.self_attn.q_proj.weight": "model-00009-of-00010.safetensors",
|
295 |
+
"model.layers.37.self_attn.v_proj.weight": "model-00009-of-00010.safetensors",
|
296 |
+
"model.layers.38.input_layernorm.weight": "model-00010-of-00010.safetensors",
|
297 |
+
"model.layers.38.mlp.down_proj.weight": "model-00010-of-00010.safetensors",
|
298 |
+
"model.layers.38.mlp.gate_proj.weight": "model-00010-of-00010.safetensors",
|
299 |
+
"model.layers.38.mlp.up_proj.weight": "model-00010-of-00010.safetensors",
|
300 |
+
"model.layers.38.post_attention_layernorm.weight": "model-00010-of-00010.safetensors",
|
301 |
+
"model.layers.38.self_attn.k_proj.weight": "model-00010-of-00010.safetensors",
|
302 |
+
"model.layers.38.self_attn.o_proj.weight": "model-00010-of-00010.safetensors",
|
303 |
+
"model.layers.38.self_attn.q_proj.weight": "model-00010-of-00010.safetensors",
|
304 |
+
"model.layers.38.self_attn.v_proj.weight": "model-00010-of-00010.safetensors",
|
305 |
+
"model.layers.39.input_layernorm.weight": "model-00010-of-00010.safetensors",
|
306 |
+
"model.layers.39.mlp.down_proj.weight": "model-00010-of-00010.safetensors",
|
307 |
+
"model.layers.39.mlp.gate_proj.weight": "model-00010-of-00010.safetensors",
|
308 |
+
"model.layers.39.mlp.up_proj.weight": "model-00010-of-00010.safetensors",
|
309 |
+
"model.layers.39.post_attention_layernorm.weight": "model-00010-of-00010.safetensors",
|
310 |
+
"model.layers.39.self_attn.k_proj.weight": "model-00010-of-00010.safetensors",
|
311 |
+
"model.layers.39.self_attn.o_proj.weight": "model-00010-of-00010.safetensors",
|
312 |
+
"model.layers.39.self_attn.q_proj.weight": "model-00010-of-00010.safetensors",
|
313 |
+
"model.layers.39.self_attn.v_proj.weight": "model-00010-of-00010.safetensors",
|
314 |
+
"model.layers.4.input_layernorm.weight": "model-00002-of-00010.safetensors",
|
315 |
+
"model.layers.4.mlp.down_proj.weight": "model-00002-of-00010.safetensors",
|
316 |
+
"model.layers.4.mlp.gate_proj.weight": "model-00002-of-00010.safetensors",
|
317 |
+
"model.layers.4.mlp.up_proj.weight": "model-00002-of-00010.safetensors",
|
318 |
+
"model.layers.4.post_attention_layernorm.weight": "model-00002-of-00010.safetensors",
|
319 |
+
"model.layers.4.self_attn.k_proj.weight": "model-00002-of-00010.safetensors",
|
320 |
+
"model.layers.4.self_attn.o_proj.weight": "model-00002-of-00010.safetensors",
|
321 |
+
"model.layers.4.self_attn.q_proj.weight": "model-00002-of-00010.safetensors",
|
322 |
+
"model.layers.4.self_attn.v_proj.weight": "model-00002-of-00010.safetensors",
|
323 |
+
"model.layers.5.input_layernorm.weight": "model-00002-of-00010.safetensors",
|
324 |
+
"model.layers.5.mlp.down_proj.weight": "model-00002-of-00010.safetensors",
|
325 |
+
"model.layers.5.mlp.gate_proj.weight": "model-00002-of-00010.safetensors",
|
326 |
+
"model.layers.5.mlp.up_proj.weight": "model-00002-of-00010.safetensors",
|
327 |
+
"model.layers.5.post_attention_layernorm.weight": "model-00002-of-00010.safetensors",
|
328 |
+
"model.layers.5.self_attn.k_proj.weight": "model-00002-of-00010.safetensors",
|
329 |
+
"model.layers.5.self_attn.o_proj.weight": "model-00002-of-00010.safetensors",
|
330 |
+
"model.layers.5.self_attn.q_proj.weight": "model-00002-of-00010.safetensors",
|
331 |
+
"model.layers.5.self_attn.v_proj.weight": "model-00002-of-00010.safetensors",
|
332 |
+
"model.layers.6.input_layernorm.weight": "model-00002-of-00010.safetensors",
|
333 |
+
"model.layers.6.mlp.down_proj.weight": "model-00002-of-00010.safetensors",
|
334 |
+
"model.layers.6.mlp.gate_proj.weight": "model-00002-of-00010.safetensors",
|
335 |
+
"model.layers.6.mlp.up_proj.weight": "model-00002-of-00010.safetensors",
|
336 |
+
"model.layers.6.post_attention_layernorm.weight": "model-00002-of-00010.safetensors",
|
337 |
+
"model.layers.6.self_attn.k_proj.weight": "model-00002-of-00010.safetensors",
|
338 |
+
"model.layers.6.self_attn.o_proj.weight": "model-00002-of-00010.safetensors",
|
339 |
+
"model.layers.6.self_attn.q_proj.weight": "model-00002-of-00010.safetensors",
|
340 |
+
"model.layers.6.self_attn.v_proj.weight": "model-00002-of-00010.safetensors",
|
341 |
+
"model.layers.7.input_layernorm.weight": "model-00003-of-00010.safetensors",
|
342 |
+
"model.layers.7.mlp.down_proj.weight": "model-00003-of-00010.safetensors",
|
343 |
+
"model.layers.7.mlp.gate_proj.weight": "model-00002-of-00010.safetensors",
|
344 |
+
"model.layers.7.mlp.up_proj.weight": "model-00003-of-00010.safetensors",
|
345 |
+
"model.layers.7.post_attention_layernorm.weight": "model-00003-of-00010.safetensors",
|
346 |
+
"model.layers.7.self_attn.k_proj.weight": "model-00002-of-00010.safetensors",
|
347 |
+
"model.layers.7.self_attn.o_proj.weight": "model-00002-of-00010.safetensors",
|
348 |
+
"model.layers.7.self_attn.q_proj.weight": "model-00002-of-00010.safetensors",
|
349 |
+
"model.layers.7.self_attn.v_proj.weight": "model-00002-of-00010.safetensors",
|
350 |
+
"model.layers.8.input_layernorm.weight": "model-00003-of-00010.safetensors",
|
351 |
+
"model.layers.8.mlp.down_proj.weight": "model-00003-of-00010.safetensors",
|
352 |
+
"model.layers.8.mlp.gate_proj.weight": "model-00003-of-00010.safetensors",
|
353 |
+
"model.layers.8.mlp.up_proj.weight": "model-00003-of-00010.safetensors",
|
354 |
+
"model.layers.8.post_attention_layernorm.weight": "model-00003-of-00010.safetensors",
|
355 |
+
"model.layers.8.self_attn.k_proj.weight": "model-00003-of-00010.safetensors",
|
356 |
+
"model.layers.8.self_attn.o_proj.weight": "model-00003-of-00010.safetensors",
|
357 |
+
"model.layers.8.self_attn.q_proj.weight": "model-00003-of-00010.safetensors",
|
358 |
+
"model.layers.8.self_attn.v_proj.weight": "model-00003-of-00010.safetensors",
|
359 |
+
"model.layers.9.input_layernorm.weight": "model-00003-of-00010.safetensors",
|
360 |
+
"model.layers.9.mlp.down_proj.weight": "model-00003-of-00010.safetensors",
|
361 |
+
"model.layers.9.mlp.gate_proj.weight": "model-00003-of-00010.safetensors",
|
362 |
+
"model.layers.9.mlp.up_proj.weight": "model-00003-of-00010.safetensors",
|
363 |
+
"model.layers.9.post_attention_layernorm.weight": "model-00003-of-00010.safetensors",
|
364 |
+
"model.layers.9.self_attn.k_proj.weight": "model-00003-of-00010.safetensors",
|
365 |
+
"model.layers.9.self_attn.o_proj.weight": "model-00003-of-00010.safetensors",
|
366 |
+
"model.layers.9.self_attn.q_proj.weight": "model-00003-of-00010.safetensors",
|
367 |
+
"model.layers.9.self_attn.v_proj.weight": "model-00003-of-00010.safetensors",
|
368 |
+
"model.norm.weight": "model-00010-of-00010.safetensors"
|
369 |
+
}
|
370 |
+
}
|
output-00001-of-00005.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:24552ca5f03868e941b372476ef2ffb4f734ebf43630443628203d3bc528801c
|
3 |
+
size 4257254246
|
output-00002-of-00005.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a7a92dcc70d83aa8c6101096ca4536e4ede679a1852c3910f57f884405d6edef
|
3 |
+
size 4215673900
|
output-00003-of-00005.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9bd243e0d10fcfe3e6f12f778e753fe2ced31f6125b95093c8480fc455ac2239
|
3 |
+
size 4259733350
|
output-00004-of-00005.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:a66f1ccdce7ecab5d36f348df0f8e7e150f3fb91d0572bd1d14ccede4a31e365
|
3 |
+
size 4171011980
|
output-00005-of-00005.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:d7ec7e0497856e6e958bccfd385388ec6b007b7d44c9427cb6edb5a8dee77666
|
3 |
+
size 3274941370
|
params.json
ADDED
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"dim": 5120,
|
3 |
+
"n_layers": 40,
|
4 |
+
"head_dim": 128,
|
5 |
+
"hidden_dim": 32768,
|
6 |
+
"n_heads": 32,
|
7 |
+
"n_kv_heads": 8,
|
8 |
+
"norm_eps": 1e-05,
|
9 |
+
"vocab_size": 131072,
|
10 |
+
"rope_theta": 100000000.0,
|
11 |
+
"max_seq_len": 32768
|
12 |
+
}
|
special_tokens_map.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tekken.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c4b90a968dbc67ef3975129d0b78a2e3cbb6bea340ab9205f22e8a0308b1ffc5
|
3 |
+
size 14801223
|
tokenizer.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b76085f9923309d873994d444989f7eb6ec074b06f25b58f1e8d7b7741070949
|
3 |
+
size 17078037
|
tokenizer_config.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|