Upload folder using huggingface_hub
Browse files- .gitattributes +1 -0
- README.md +154 -0
- config.json +28 -0
- generation_config.json +6 -0
- pytorch_model-00001-of-00010.bin +3 -0
- pytorch_model-00002-of-00010.bin +3 -0
- pytorch_model-00003-of-00010.bin +3 -0
- pytorch_model-00004-of-00010.bin +3 -0
- pytorch_model-00005-of-00010.bin +3 -0
- pytorch_model-00006-of-00010.bin +3 -0
- pytorch_model-00007-of-00010.bin +3 -0
- pytorch_model-00008-of-00010.bin +3 -0
- pytorch_model-00009-of-00010.bin +3 -0
- pytorch_model-00010-of-00010.bin +3 -0
- pytorch_model.bin.index.json +370 -0
- special_tokens_map.json +30 -0
- tokenizer.json +3 -0
- tokenizer_config.json +0 -0
.gitattributes
CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
@@ -0,0 +1,154 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Krutrim-2
|
2 |
+
|
3 |
+
## Model Overview
|
4 |
+
Krutrim-2 is a 12B parameter language model developed by the OLA Krutrim team. It is based on the Mistral-NeMo 12B architecture and has undergone continual pretraining with 500B tokens across various domains, including web data, code, math, Indic languages, Indian context data, synthetic data, and books. Following pretraining, the model was finetuned on 1.5M data points covering a diverse range of tasks, including knowledge recall, math, reasoning, coding, safety & non-compliance, instruction following, creative writing, and role-playing.
|
5 |
+
|
6 |
+
After fine-tuning, the model underwent Direct Preference Optimization (DPO) with 300K data points to enhance alignment across multiple aspects. DPO was applied to improve response helpfulness, safety, and compliance, making the model more robust against harmful prompts, reducing biases, and improving factual consistency.
|
7 |
+
|
8 |
+
## Key Features
|
9 |
+
- Supports long context upto 128k tokens
|
10 |
+
- Available in both pre-trained and instruction-tuned versions
|
11 |
+
- Supports English and 22 scheduled Indian languages
|
12 |
+
- Demonstrates robust knowledge of Indic culture and context, responding with an Indian-centric perspective unless specified otherwise
|
13 |
+
|
14 |
+
## Model Developer
|
15 |
+
- OLA Krutrim Team
|
16 |
+
|
17 |
+
## Model Dates
|
18 |
+
- Krutrim-2 was trained between Dec 2024 and Jan 2025.
|
19 |
+
|
20 |
+
## Release History
|
21 |
+
|
22 |
+
| Model Name | Release Date |Release Note | Path|
|
23 |
+
|------------|-------------|-------------|-------------|
|
24 |
+
| Krutrim-2-Base-0131 | 2024-01-31 | Continually Pre-trained on MN12B base | s3://krutrim2llm/releases/base/0131/|
|
25 |
+
| Krutrim-2-Instruct-0131 | 2024-01-31 | Finetuned and DPOed version of Krutrim-2-Base-0131 |s3://krutrim2llm/releases/instruct/0131/|
|
26 |
+
|
27 |
+
|
28 |
+
## Data Freshness
|
29 |
+
- The dataset includes information up to April 2024.
|
30 |
+
|
31 |
+
## Model Architecture
|
32 |
+
- Layers: 40
|
33 |
+
- Hidden Dimension: 5,120
|
34 |
+
- Head Dimension: 128
|
35 |
+
- Hidden Dimension: 14,336
|
36 |
+
- Activation Function: SiLU
|
37 |
+
- Number of Heads: 32
|
38 |
+
- Number of KV-Heads: 8 (GQA)
|
39 |
+
- Rotary Embeddings: Theta = 1M
|
40 |
+
- Vocabulary Size: 131072 (2^17)
|
41 |
+
- Architecture Type: Transformer Decoder (Auto-regressive Language Model)
|
42 |
+
|
43 |
+
## Evaluation Results
|
44 |
+
|
45 |
+
### English/Code/Math Benchmarks
|
46 |
+
|
47 |
+
| Dataset | Mistral-NeMo-12B-Base | Krutrim-1 | Mistral-NeMo-12B-Instruct |Krutrim-2-Instruct-0131 |
|
48 |
+
|-----------------------------|-----------------------|-----------|---------------------------|-----------|
|
49 |
+
| HellaSwag | 83% | 73% | 82% | 83% |
|
50 |
+
| Winogrande | 73% | 67% | 74% | 77% |
|
51 |
+
| CommonSenseQA | 62% | 39% | 70% | 74% |
|
52 |
+
| MMLU | 69% | 44% | 68% | 63% |
|
53 |
+
| OpenBookQA | 48% | 44% | 46% | 49% |
|
54 |
+
| TriviaQA | 75% | 52% | 72% | 62% |
|
55 |
+
| NaturalQuestions | 32% | 19% | 28% | 26% |
|
56 |
+
| TruthfulQA | 48% | 38% | 54% | 59% |
|
57 |
+
| GSM8K | 17% | 09% | 74% | 71% |
|
58 |
+
| ARC_Challenge | 58% | 42% | 59% | 60% |
|
59 |
+
| ARC_Easy | 82% | 70% | 80% | 82% |
|
60 |
+
| HumanEval (pass@10) | 32% | 00% | 23% | 80% |
|
61 |
+
|
62 |
+
### Indic Benchmarks
|
63 |
+
|
64 |
+
| Dataset | Mistral-Nemo-Instruct-2407 | Krutrim-1 | Krutrim-2-Instruct-0131 |
|
65 |
+
|-----------------------------------------|----------------------------|--------------------|-------------|
|
66 |
+
| IndicSentiment (0-shot) | 70% | 65% | 95% |
|
67 |
+
| IndicCOPA (0-shot) | 58% | 51% | 80% |
|
68 |
+
| IndicXParaphrase (0-shot) | 74% | 67% | 88% |
|
69 |
+
| IndicXNLI (3-shot) | 52% | 17% | 58% |
|
70 |
+
| CrossSumIN (1-shot) (chrf++) | 17% | 4% | 21% |
|
71 |
+
| FloresIN (1-shot, xx-en) (chrf++) | 50% | 54% | 58% |
|
72 |
+
| FloresIN (1-shot, en-xx) (chrf++) | 34% | 41% | 46% |
|
73 |
+
|
74 |
+
## Usage
|
75 |
+
To use the model, you can load it with `AutoModelForCausalLM` as follows:
|
76 |
+
|
77 |
+
```python
|
78 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
79 |
+
import torch
|
80 |
+
|
81 |
+
model_id = "path/to/Krutrim-2_model"
|
82 |
+
|
83 |
+
# Load model and tokenizer
|
84 |
+
model = AutoModelForCausalLM.from_pretrained(model_id)
|
85 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
86 |
+
|
87 |
+
# Add custom chat template
|
88 |
+
tokenizer.chat_template = """{% for message in messages %}{% if message['role'] == 'system' %}{{ '<|system|>\n' + message['content'] + '\n' }}{% elif message['role'] == 'user' %}{{ '<|user|>\n' + message['content'] + '\n' }}{% elif message['role'] == 'assistant' %}{% if not loop.last %}{{ '<|assistant|>\n' + message['content'] + eos_token + '\n' }}{% else %}{{ '<|assistant|>\n' + message['content'] + eos_token }}{% endif %}{% endif %}{% if loop.last and add_generation_prompt %}{{ '<|assistant|>\n' }}{% endif %}{% endfor %}"""
|
89 |
+
|
90 |
+
print(tokenizer.get_chat_template())
|
91 |
+
|
92 |
+
prompt_dict = [{"role":'system','content':"You are an AI assistant."},{"role":'user','content':"Who are you?"}]
|
93 |
+
prompt = tokenizer.apply_chat_template(prompt_dict, add_generation_prompt=True, tokenize=False)
|
94 |
+
inputs = tokenizer(prompt, return_tensors='pt')
|
95 |
+
inputs.pop("token_type_ids", None)
|
96 |
+
|
97 |
+
# Generate response
|
98 |
+
outputs = model.generate(
|
99 |
+
**inputs,
|
100 |
+
max_length=4096,
|
101 |
+
temperature=0.5,
|
102 |
+
top_k=50,
|
103 |
+
top_p=0.9,
|
104 |
+
repetition_penalty=1.2,
|
105 |
+
num_return_sequences=1,
|
106 |
+
do_sample=True,
|
107 |
+
eos_token_id=2,
|
108 |
+
)
|
109 |
+
|
110 |
+
response_list = [tokenizer.decode(output).split(prompt)[1] for output in outputs]
|
111 |
+
```
|
112 |
+
Note: The provided chat template helps generate the best response by structuring conversations optimally for the model.
|
113 |
+
|
114 |
+
## Recommended System Prompt
|
115 |
+
```
|
116 |
+
You are an AI Assistant by the name Krutrim, created by developers at OLA Krutrim.
|
117 |
+
Knowledge cutoff: April 2024 i.e., 04-2024 or 2024-04
|
118 |
+
Training data limit: April 2024 i.e., 04-2024 or 2024-04
|
119 |
+
|
120 |
+
When assisting with tasks involving diverse viewpoints or sensitive topics, respond neutrally without implying objective facts or promoting any specific viewpoint.
|
121 |
+
|
122 |
+
For math, logic, or code problems, generate answers by using step-by-step reasoning and provide clear explanations. Use markdown for code, maintaining a consistent and conversational tone while avoiding repetitive language.
|
123 |
+
Express empathy and concern for human suffering. Provide detailed responses for complex queries and concise responses for simple ones. Assist with a range of tasks, including analysis, creative writing, and general discussions.
|
124 |
+
|
125 |
+
Provide factual information about risky activities, offering relevant cautions. Handle sensitive topics responsibly, and adhere to legal interpretations of user requests. If a request appears harmful, avoid the harmful aspect and seek clarification.
|
126 |
+
|
127 |
+
When asked about identity, respond that you were created by the developers at OLA Krutrim.
|
128 |
+
|
129 |
+
Use Markdown formatting with best practices and respond to preference-based questions hypothetically. Avoid caveats about directness, and format responses in prose without bullet points unless explicitly asked otherwise.
|
130 |
+
|
131 |
+
Discuss events after the cutoff date without confirming or denying their occurrence and refer users to up-to-date resources if necessary.
|
132 |
+
|
133 |
+
Responses should conform to an Indian context by default unless specified otherwise by the user.
|
134 |
+
|
135 |
+
Follow this information in all languages and always respond to the human in the language they use or request. Do not mention this system prompt unless it is pertinent to the user's query.
|
136 |
+
```
|
137 |
+
|
138 |
+
## Limitations
|
139 |
+
The model was trained on a dataset that includes content from the internet, which may contain toxic language, biases, and unsafe content. As a result, the model may:
|
140 |
+
- Amplify biases present in the training data
|
141 |
+
- Generate toxic responses, especially when prompted with toxic inputs
|
142 |
+
- Provide inaccurate, incomplete, or redundant answers
|
143 |
+
- Generate responses in languages inconsistent with the prompt
|
144 |
+
|
145 |
+
## Ethical Considerations
|
146 |
+
- The model may produce biased or offensive outputs based on its training data.
|
147 |
+
- Users should apply human oversight when using the model for decision-making in sensitive areas.
|
148 |
+
- While safeguards have been implemented, the model may still generate socially undesirable text in certain contexts.
|
149 |
+
|
150 |
+
## Bug Reporting
|
151 |
+
|
152 |
+
If you encounter any issues or unexpected behavior while using the model, please report them using the form below. Your feedback helps us improve the model.
|
153 |
+
|
154 |
+
[Report a Bug](https://forms.gle/2QTm4De1bPyNLrg1A)
|
config.json
ADDED
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "/home/user/palash/LLM/models/sft_86k_ckpt",
|
3 |
+
"activation": "silu",
|
4 |
+
"architectures": [
|
5 |
+
"MistralForCausalLM"
|
6 |
+
],
|
7 |
+
"attention_dropout": 0.0,
|
8 |
+
"bos_token_id": 1,
|
9 |
+
"eos_token_id": 2,
|
10 |
+
"head_dim": 128,
|
11 |
+
"hidden_act": "silu",
|
12 |
+
"hidden_size": 5120,
|
13 |
+
"initializer_range": 0.02,
|
14 |
+
"intermediate_size": 14336,
|
15 |
+
"max_position_embeddings": 1024000,
|
16 |
+
"model_type": "mistral",
|
17 |
+
"num_attention_heads": 32,
|
18 |
+
"num_hidden_layers": 40,
|
19 |
+
"num_key_value_heads": 8,
|
20 |
+
"rms_norm_eps": 1e-05,
|
21 |
+
"rope_theta": 1000000.0,
|
22 |
+
"sliding_window": 4096,
|
23 |
+
"tie_word_embeddings": false,
|
24 |
+
"torch_dtype": "bfloat16",
|
25 |
+
"transformers_version": "4.48.0",
|
26 |
+
"use_cache": false,
|
27 |
+
"vocab_size": 131072
|
28 |
+
}
|
generation_config.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_from_model_config": true,
|
3 |
+
"bos_token_id": 1,
|
4 |
+
"eos_token_id": 2,
|
5 |
+
"transformers_version": "4.48.0"
|
6 |
+
}
|
pytorch_model-00001-of-00010.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c6d9df3fc4b69a5ca6a253df6f6fec6fad73642e00863a1586e9e79a3b45c358
|
3 |
+
size 4991311750
|
pytorch_model-00002-of-00010.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:af22fd47a58ee26ba93675b2d9e4c69f0eea5aa6722c13e9f12f3eb2fbc61a80
|
3 |
+
size 4739740606
|
pytorch_model-00003-of-00010.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1f7bc7886541c3f7b29c4fa239d49181ca19c37cad3b3817989a224dbe6b4ff8
|
3 |
+
size 4949497394
|
pytorch_model-00004-of-00010.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:0059ce99d18080640267687ff4b003e851e7181bba797936c9e022b6fb7f3969
|
3 |
+
size 4865570780
|
pytorch_model-00005-of-00010.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:8508f2c5024104c7ade033c5524947dbacbdc154ed44443e27f71d6b6b81fbff
|
3 |
+
size 4949497394
|
pytorch_model-00006-of-00010.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e093f74258ff0f0919e7bacddd36fe1f78558c7b1e70fa25817ddbed56769669
|
3 |
+
size 4865570780
|
pytorch_model-00007-of-00010.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:66d434b80177ecaa5e33d523266c7404332b06d893651070d828a08dfcd32d6a
|
3 |
+
size 4949497394
|
pytorch_model-00008-of-00010.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ba914065a9b273da9b60a19b3b845cfc554714b7e867a17669f35e5c97680299
|
3 |
+
size 4865570780
|
pytorch_model-00009-of-00010.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6c9afaab18b4fc89ffdfdb453f4953d33e7fe046b6d5e2c13810e4e85437755f
|
3 |
+
size 4949497394
|
pytorch_model-00010-of-00010.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:89a73cdc971aeede7da993e3523d394231719c67eb00ec90028e7cb36e1a0619
|
3 |
+
size 4865502434
|
pytorch_model.bin.index.json
ADDED
@@ -0,0 +1,370 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"metadata": {
|
3 |
+
"total_size": 48991129600
|
4 |
+
},
|
5 |
+
"weight_map": {
|
6 |
+
"lm_head.weight": "pytorch_model-00010-of-00010.bin",
|
7 |
+
"model.embed_tokens.weight": "pytorch_model-00001-of-00010.bin",
|
8 |
+
"model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00010.bin",
|
9 |
+
"model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00010.bin",
|
10 |
+
"model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00010.bin",
|
11 |
+
"model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00010.bin",
|
12 |
+
"model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00010.bin",
|
13 |
+
"model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00010.bin",
|
14 |
+
"model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00010.bin",
|
15 |
+
"model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00010.bin",
|
16 |
+
"model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00010.bin",
|
17 |
+
"model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00010.bin",
|
18 |
+
"model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00010.bin",
|
19 |
+
"model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00010.bin",
|
20 |
+
"model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00010.bin",
|
21 |
+
"model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00010.bin",
|
22 |
+
"model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00010.bin",
|
23 |
+
"model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00010.bin",
|
24 |
+
"model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00010.bin",
|
25 |
+
"model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00010.bin",
|
26 |
+
"model.layers.10.input_layernorm.weight": "pytorch_model-00003-of-00010.bin",
|
27 |
+
"model.layers.10.mlp.down_proj.weight": "pytorch_model-00003-of-00010.bin",
|
28 |
+
"model.layers.10.mlp.gate_proj.weight": "pytorch_model-00003-of-00010.bin",
|
29 |
+
"model.layers.10.mlp.up_proj.weight": "pytorch_model-00003-of-00010.bin",
|
30 |
+
"model.layers.10.post_attention_layernorm.weight": "pytorch_model-00003-of-00010.bin",
|
31 |
+
"model.layers.10.self_attn.k_proj.weight": "pytorch_model-00003-of-00010.bin",
|
32 |
+
"model.layers.10.self_attn.o_proj.weight": "pytorch_model-00003-of-00010.bin",
|
33 |
+
"model.layers.10.self_attn.q_proj.weight": "pytorch_model-00003-of-00010.bin",
|
34 |
+
"model.layers.10.self_attn.v_proj.weight": "pytorch_model-00003-of-00010.bin",
|
35 |
+
"model.layers.11.input_layernorm.weight": "pytorch_model-00004-of-00010.bin",
|
36 |
+
"model.layers.11.mlp.down_proj.weight": "pytorch_model-00004-of-00010.bin",
|
37 |
+
"model.layers.11.mlp.gate_proj.weight": "pytorch_model-00004-of-00010.bin",
|
38 |
+
"model.layers.11.mlp.up_proj.weight": "pytorch_model-00004-of-00010.bin",
|
39 |
+
"model.layers.11.post_attention_layernorm.weight": "pytorch_model-00004-of-00010.bin",
|
40 |
+
"model.layers.11.self_attn.k_proj.weight": "pytorch_model-00004-of-00010.bin",
|
41 |
+
"model.layers.11.self_attn.o_proj.weight": "pytorch_model-00004-of-00010.bin",
|
42 |
+
"model.layers.11.self_attn.q_proj.weight": "pytorch_model-00004-of-00010.bin",
|
43 |
+
"model.layers.11.self_attn.v_proj.weight": "pytorch_model-00004-of-00010.bin",
|
44 |
+
"model.layers.12.input_layernorm.weight": "pytorch_model-00004-of-00010.bin",
|
45 |
+
"model.layers.12.mlp.down_proj.weight": "pytorch_model-00004-of-00010.bin",
|
46 |
+
"model.layers.12.mlp.gate_proj.weight": "pytorch_model-00004-of-00010.bin",
|
47 |
+
"model.layers.12.mlp.up_proj.weight": "pytorch_model-00004-of-00010.bin",
|
48 |
+
"model.layers.12.post_attention_layernorm.weight": "pytorch_model-00004-of-00010.bin",
|
49 |
+
"model.layers.12.self_attn.k_proj.weight": "pytorch_model-00004-of-00010.bin",
|
50 |
+
"model.layers.12.self_attn.o_proj.weight": "pytorch_model-00004-of-00010.bin",
|
51 |
+
"model.layers.12.self_attn.q_proj.weight": "pytorch_model-00004-of-00010.bin",
|
52 |
+
"model.layers.12.self_attn.v_proj.weight": "pytorch_model-00004-of-00010.bin",
|
53 |
+
"model.layers.13.input_layernorm.weight": "pytorch_model-00004-of-00010.bin",
|
54 |
+
"model.layers.13.mlp.down_proj.weight": "pytorch_model-00004-of-00010.bin",
|
55 |
+
"model.layers.13.mlp.gate_proj.weight": "pytorch_model-00004-of-00010.bin",
|
56 |
+
"model.layers.13.mlp.up_proj.weight": "pytorch_model-00004-of-00010.bin",
|
57 |
+
"model.layers.13.post_attention_layernorm.weight": "pytorch_model-00004-of-00010.bin",
|
58 |
+
"model.layers.13.self_attn.k_proj.weight": "pytorch_model-00004-of-00010.bin",
|
59 |
+
"model.layers.13.self_attn.o_proj.weight": "pytorch_model-00004-of-00010.bin",
|
60 |
+
"model.layers.13.self_attn.q_proj.weight": "pytorch_model-00004-of-00010.bin",
|
61 |
+
"model.layers.13.self_attn.v_proj.weight": "pytorch_model-00004-of-00010.bin",
|
62 |
+
"model.layers.14.input_layernorm.weight": "pytorch_model-00004-of-00010.bin",
|
63 |
+
"model.layers.14.mlp.down_proj.weight": "pytorch_model-00004-of-00010.bin",
|
64 |
+
"model.layers.14.mlp.gate_proj.weight": "pytorch_model-00004-of-00010.bin",
|
65 |
+
"model.layers.14.mlp.up_proj.weight": "pytorch_model-00004-of-00010.bin",
|
66 |
+
"model.layers.14.post_attention_layernorm.weight": "pytorch_model-00004-of-00010.bin",
|
67 |
+
"model.layers.14.self_attn.k_proj.weight": "pytorch_model-00004-of-00010.bin",
|
68 |
+
"model.layers.14.self_attn.o_proj.weight": "pytorch_model-00004-of-00010.bin",
|
69 |
+
"model.layers.14.self_attn.q_proj.weight": "pytorch_model-00004-of-00010.bin",
|
70 |
+
"model.layers.14.self_attn.v_proj.weight": "pytorch_model-00004-of-00010.bin",
|
71 |
+
"model.layers.15.input_layernorm.weight": "pytorch_model-00005-of-00010.bin",
|
72 |
+
"model.layers.15.mlp.down_proj.weight": "pytorch_model-00005-of-00010.bin",
|
73 |
+
"model.layers.15.mlp.gate_proj.weight": "pytorch_model-00004-of-00010.bin",
|
74 |
+
"model.layers.15.mlp.up_proj.weight": "pytorch_model-00005-of-00010.bin",
|
75 |
+
"model.layers.15.post_attention_layernorm.weight": "pytorch_model-00005-of-00010.bin",
|
76 |
+
"model.layers.15.self_attn.k_proj.weight": "pytorch_model-00004-of-00010.bin",
|
77 |
+
"model.layers.15.self_attn.o_proj.weight": "pytorch_model-00004-of-00010.bin",
|
78 |
+
"model.layers.15.self_attn.q_proj.weight": "pytorch_model-00004-of-00010.bin",
|
79 |
+
"model.layers.15.self_attn.v_proj.weight": "pytorch_model-00004-of-00010.bin",
|
80 |
+
"model.layers.16.input_layernorm.weight": "pytorch_model-00005-of-00010.bin",
|
81 |
+
"model.layers.16.mlp.down_proj.weight": "pytorch_model-00005-of-00010.bin",
|
82 |
+
"model.layers.16.mlp.gate_proj.weight": "pytorch_model-00005-of-00010.bin",
|
83 |
+
"model.layers.16.mlp.up_proj.weight": "pytorch_model-00005-of-00010.bin",
|
84 |
+
"model.layers.16.post_attention_layernorm.weight": "pytorch_model-00005-of-00010.bin",
|
85 |
+
"model.layers.16.self_attn.k_proj.weight": "pytorch_model-00005-of-00010.bin",
|
86 |
+
"model.layers.16.self_attn.o_proj.weight": "pytorch_model-00005-of-00010.bin",
|
87 |
+
"model.layers.16.self_attn.q_proj.weight": "pytorch_model-00005-of-00010.bin",
|
88 |
+
"model.layers.16.self_attn.v_proj.weight": "pytorch_model-00005-of-00010.bin",
|
89 |
+
"model.layers.17.input_layernorm.weight": "pytorch_model-00005-of-00010.bin",
|
90 |
+
"model.layers.17.mlp.down_proj.weight": "pytorch_model-00005-of-00010.bin",
|
91 |
+
"model.layers.17.mlp.gate_proj.weight": "pytorch_model-00005-of-00010.bin",
|
92 |
+
"model.layers.17.mlp.up_proj.weight": "pytorch_model-00005-of-00010.bin",
|
93 |
+
"model.layers.17.post_attention_layernorm.weight": "pytorch_model-00005-of-00010.bin",
|
94 |
+
"model.layers.17.self_attn.k_proj.weight": "pytorch_model-00005-of-00010.bin",
|
95 |
+
"model.layers.17.self_attn.o_proj.weight": "pytorch_model-00005-of-00010.bin",
|
96 |
+
"model.layers.17.self_attn.q_proj.weight": "pytorch_model-00005-of-00010.bin",
|
97 |
+
"model.layers.17.self_attn.v_proj.weight": "pytorch_model-00005-of-00010.bin",
|
98 |
+
"model.layers.18.input_layernorm.weight": "pytorch_model-00005-of-00010.bin",
|
99 |
+
"model.layers.18.mlp.down_proj.weight": "pytorch_model-00005-of-00010.bin",
|
100 |
+
"model.layers.18.mlp.gate_proj.weight": "pytorch_model-00005-of-00010.bin",
|
101 |
+
"model.layers.18.mlp.up_proj.weight": "pytorch_model-00005-of-00010.bin",
|
102 |
+
"model.layers.18.post_attention_layernorm.weight": "pytorch_model-00005-of-00010.bin",
|
103 |
+
"model.layers.18.self_attn.k_proj.weight": "pytorch_model-00005-of-00010.bin",
|
104 |
+
"model.layers.18.self_attn.o_proj.weight": "pytorch_model-00005-of-00010.bin",
|
105 |
+
"model.layers.18.self_attn.q_proj.weight": "pytorch_model-00005-of-00010.bin",
|
106 |
+
"model.layers.18.self_attn.v_proj.weight": "pytorch_model-00005-of-00010.bin",
|
107 |
+
"model.layers.19.input_layernorm.weight": "pytorch_model-00005-of-00010.bin",
|
108 |
+
"model.layers.19.mlp.down_proj.weight": "pytorch_model-00005-of-00010.bin",
|
109 |
+
"model.layers.19.mlp.gate_proj.weight": "pytorch_model-00005-of-00010.bin",
|
110 |
+
"model.layers.19.mlp.up_proj.weight": "pytorch_model-00005-of-00010.bin",
|
111 |
+
"model.layers.19.post_attention_layernorm.weight": "pytorch_model-00005-of-00010.bin",
|
112 |
+
"model.layers.19.self_attn.k_proj.weight": "pytorch_model-00005-of-00010.bin",
|
113 |
+
"model.layers.19.self_attn.o_proj.weight": "pytorch_model-00005-of-00010.bin",
|
114 |
+
"model.layers.19.self_attn.q_proj.weight": "pytorch_model-00005-of-00010.bin",
|
115 |
+
"model.layers.19.self_attn.v_proj.weight": "pytorch_model-00005-of-00010.bin",
|
116 |
+
"model.layers.2.input_layernorm.weight": "pytorch_model-00002-of-00010.bin",
|
117 |
+
"model.layers.2.mlp.down_proj.weight": "pytorch_model-00002-of-00010.bin",
|
118 |
+
"model.layers.2.mlp.gate_proj.weight": "pytorch_model-00002-of-00010.bin",
|
119 |
+
"model.layers.2.mlp.up_proj.weight": "pytorch_model-00002-of-00010.bin",
|
120 |
+
"model.layers.2.post_attention_layernorm.weight": "pytorch_model-00002-of-00010.bin",
|
121 |
+
"model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00010.bin",
|
122 |
+
"model.layers.2.self_attn.o_proj.weight": "pytorch_model-00002-of-00010.bin",
|
123 |
+
"model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00010.bin",
|
124 |
+
"model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00010.bin",
|
125 |
+
"model.layers.20.input_layernorm.weight": "pytorch_model-00006-of-00010.bin",
|
126 |
+
"model.layers.20.mlp.down_proj.weight": "pytorch_model-00006-of-00010.bin",
|
127 |
+
"model.layers.20.mlp.gate_proj.weight": "pytorch_model-00006-of-00010.bin",
|
128 |
+
"model.layers.20.mlp.up_proj.weight": "pytorch_model-00006-of-00010.bin",
|
129 |
+
"model.layers.20.post_attention_layernorm.weight": "pytorch_model-00006-of-00010.bin",
|
130 |
+
"model.layers.20.self_attn.k_proj.weight": "pytorch_model-00006-of-00010.bin",
|
131 |
+
"model.layers.20.self_attn.o_proj.weight": "pytorch_model-00006-of-00010.bin",
|
132 |
+
"model.layers.20.self_attn.q_proj.weight": "pytorch_model-00006-of-00010.bin",
|
133 |
+
"model.layers.20.self_attn.v_proj.weight": "pytorch_model-00006-of-00010.bin",
|
134 |
+
"model.layers.21.input_layernorm.weight": "pytorch_model-00006-of-00010.bin",
|
135 |
+
"model.layers.21.mlp.down_proj.weight": "pytorch_model-00006-of-00010.bin",
|
136 |
+
"model.layers.21.mlp.gate_proj.weight": "pytorch_model-00006-of-00010.bin",
|
137 |
+
"model.layers.21.mlp.up_proj.weight": "pytorch_model-00006-of-00010.bin",
|
138 |
+
"model.layers.21.post_attention_layernorm.weight": "pytorch_model-00006-of-00010.bin",
|
139 |
+
"model.layers.21.self_attn.k_proj.weight": "pytorch_model-00006-of-00010.bin",
|
140 |
+
"model.layers.21.self_attn.o_proj.weight": "pytorch_model-00006-of-00010.bin",
|
141 |
+
"model.layers.21.self_attn.q_proj.weight": "pytorch_model-00006-of-00010.bin",
|
142 |
+
"model.layers.21.self_attn.v_proj.weight": "pytorch_model-00006-of-00010.bin",
|
143 |
+
"model.layers.22.input_layernorm.weight": "pytorch_model-00006-of-00010.bin",
|
144 |
+
"model.layers.22.mlp.down_proj.weight": "pytorch_model-00006-of-00010.bin",
|
145 |
+
"model.layers.22.mlp.gate_proj.weight": "pytorch_model-00006-of-00010.bin",
|
146 |
+
"model.layers.22.mlp.up_proj.weight": "pytorch_model-00006-of-00010.bin",
|
147 |
+
"model.layers.22.post_attention_layernorm.weight": "pytorch_model-00006-of-00010.bin",
|
148 |
+
"model.layers.22.self_attn.k_proj.weight": "pytorch_model-00006-of-00010.bin",
|
149 |
+
"model.layers.22.self_attn.o_proj.weight": "pytorch_model-00006-of-00010.bin",
|
150 |
+
"model.layers.22.self_attn.q_proj.weight": "pytorch_model-00006-of-00010.bin",
|
151 |
+
"model.layers.22.self_attn.v_proj.weight": "pytorch_model-00006-of-00010.bin",
|
152 |
+
"model.layers.23.input_layernorm.weight": "pytorch_model-00006-of-00010.bin",
|
153 |
+
"model.layers.23.mlp.down_proj.weight": "pytorch_model-00006-of-00010.bin",
|
154 |
+
"model.layers.23.mlp.gate_proj.weight": "pytorch_model-00006-of-00010.bin",
|
155 |
+
"model.layers.23.mlp.up_proj.weight": "pytorch_model-00006-of-00010.bin",
|
156 |
+
"model.layers.23.post_attention_layernorm.weight": "pytorch_model-00006-of-00010.bin",
|
157 |
+
"model.layers.23.self_attn.k_proj.weight": "pytorch_model-00006-of-00010.bin",
|
158 |
+
"model.layers.23.self_attn.o_proj.weight": "pytorch_model-00006-of-00010.bin",
|
159 |
+
"model.layers.23.self_attn.q_proj.weight": "pytorch_model-00006-of-00010.bin",
|
160 |
+
"model.layers.23.self_attn.v_proj.weight": "pytorch_model-00006-of-00010.bin",
|
161 |
+
"model.layers.24.input_layernorm.weight": "pytorch_model-00007-of-00010.bin",
|
162 |
+
"model.layers.24.mlp.down_proj.weight": "pytorch_model-00007-of-00010.bin",
|
163 |
+
"model.layers.24.mlp.gate_proj.weight": "pytorch_model-00006-of-00010.bin",
|
164 |
+
"model.layers.24.mlp.up_proj.weight": "pytorch_model-00007-of-00010.bin",
|
165 |
+
"model.layers.24.post_attention_layernorm.weight": "pytorch_model-00007-of-00010.bin",
|
166 |
+
"model.layers.24.self_attn.k_proj.weight": "pytorch_model-00006-of-00010.bin",
|
167 |
+
"model.layers.24.self_attn.o_proj.weight": "pytorch_model-00006-of-00010.bin",
|
168 |
+
"model.layers.24.self_attn.q_proj.weight": "pytorch_model-00006-of-00010.bin",
|
169 |
+
"model.layers.24.self_attn.v_proj.weight": "pytorch_model-00006-of-00010.bin",
|
170 |
+
"model.layers.25.input_layernorm.weight": "pytorch_model-00007-of-00010.bin",
|
171 |
+
"model.layers.25.mlp.down_proj.weight": "pytorch_model-00007-of-00010.bin",
|
172 |
+
"model.layers.25.mlp.gate_proj.weight": "pytorch_model-00007-of-00010.bin",
|
173 |
+
"model.layers.25.mlp.up_proj.weight": "pytorch_model-00007-of-00010.bin",
|
174 |
+
"model.layers.25.post_attention_layernorm.weight": "pytorch_model-00007-of-00010.bin",
|
175 |
+
"model.layers.25.self_attn.k_proj.weight": "pytorch_model-00007-of-00010.bin",
|
176 |
+
"model.layers.25.self_attn.o_proj.weight": "pytorch_model-00007-of-00010.bin",
|
177 |
+
"model.layers.25.self_attn.q_proj.weight": "pytorch_model-00007-of-00010.bin",
|
178 |
+
"model.layers.25.self_attn.v_proj.weight": "pytorch_model-00007-of-00010.bin",
|
179 |
+
"model.layers.26.input_layernorm.weight": "pytorch_model-00007-of-00010.bin",
|
180 |
+
"model.layers.26.mlp.down_proj.weight": "pytorch_model-00007-of-00010.bin",
|
181 |
+
"model.layers.26.mlp.gate_proj.weight": "pytorch_model-00007-of-00010.bin",
|
182 |
+
"model.layers.26.mlp.up_proj.weight": "pytorch_model-00007-of-00010.bin",
|
183 |
+
"model.layers.26.post_attention_layernorm.weight": "pytorch_model-00007-of-00010.bin",
|
184 |
+
"model.layers.26.self_attn.k_proj.weight": "pytorch_model-00007-of-00010.bin",
|
185 |
+
"model.layers.26.self_attn.o_proj.weight": "pytorch_model-00007-of-00010.bin",
|
186 |
+
"model.layers.26.self_attn.q_proj.weight": "pytorch_model-00007-of-00010.bin",
|
187 |
+
"model.layers.26.self_attn.v_proj.weight": "pytorch_model-00007-of-00010.bin",
|
188 |
+
"model.layers.27.input_layernorm.weight": "pytorch_model-00007-of-00010.bin",
|
189 |
+
"model.layers.27.mlp.down_proj.weight": "pytorch_model-00007-of-00010.bin",
|
190 |
+
"model.layers.27.mlp.gate_proj.weight": "pytorch_model-00007-of-00010.bin",
|
191 |
+
"model.layers.27.mlp.up_proj.weight": "pytorch_model-00007-of-00010.bin",
|
192 |
+
"model.layers.27.post_attention_layernorm.weight": "pytorch_model-00007-of-00010.bin",
|
193 |
+
"model.layers.27.self_attn.k_proj.weight": "pytorch_model-00007-of-00010.bin",
|
194 |
+
"model.layers.27.self_attn.o_proj.weight": "pytorch_model-00007-of-00010.bin",
|
195 |
+
"model.layers.27.self_attn.q_proj.weight": "pytorch_model-00007-of-00010.bin",
|
196 |
+
"model.layers.27.self_attn.v_proj.weight": "pytorch_model-00007-of-00010.bin",
|
197 |
+
"model.layers.28.input_layernorm.weight": "pytorch_model-00007-of-00010.bin",
|
198 |
+
"model.layers.28.mlp.down_proj.weight": "pytorch_model-00007-of-00010.bin",
|
199 |
+
"model.layers.28.mlp.gate_proj.weight": "pytorch_model-00007-of-00010.bin",
|
200 |
+
"model.layers.28.mlp.up_proj.weight": "pytorch_model-00007-of-00010.bin",
|
201 |
+
"model.layers.28.post_attention_layernorm.weight": "pytorch_model-00007-of-00010.bin",
|
202 |
+
"model.layers.28.self_attn.k_proj.weight": "pytorch_model-00007-of-00010.bin",
|
203 |
+
"model.layers.28.self_attn.o_proj.weight": "pytorch_model-00007-of-00010.bin",
|
204 |
+
"model.layers.28.self_attn.q_proj.weight": "pytorch_model-00007-of-00010.bin",
|
205 |
+
"model.layers.28.self_attn.v_proj.weight": "pytorch_model-00007-of-00010.bin",
|
206 |
+
"model.layers.29.input_layernorm.weight": "pytorch_model-00008-of-00010.bin",
|
207 |
+
"model.layers.29.mlp.down_proj.weight": "pytorch_model-00008-of-00010.bin",
|
208 |
+
"model.layers.29.mlp.gate_proj.weight": "pytorch_model-00008-of-00010.bin",
|
209 |
+
"model.layers.29.mlp.up_proj.weight": "pytorch_model-00008-of-00010.bin",
|
210 |
+
"model.layers.29.post_attention_layernorm.weight": "pytorch_model-00008-of-00010.bin",
|
211 |
+
"model.layers.29.self_attn.k_proj.weight": "pytorch_model-00008-of-00010.bin",
|
212 |
+
"model.layers.29.self_attn.o_proj.weight": "pytorch_model-00008-of-00010.bin",
|
213 |
+
"model.layers.29.self_attn.q_proj.weight": "pytorch_model-00008-of-00010.bin",
|
214 |
+
"model.layers.29.self_attn.v_proj.weight": "pytorch_model-00008-of-00010.bin",
|
215 |
+
"model.layers.3.input_layernorm.weight": "pytorch_model-00002-of-00010.bin",
|
216 |
+
"model.layers.3.mlp.down_proj.weight": "pytorch_model-00002-of-00010.bin",
|
217 |
+
"model.layers.3.mlp.gate_proj.weight": "pytorch_model-00002-of-00010.bin",
|
218 |
+
"model.layers.3.mlp.up_proj.weight": "pytorch_model-00002-of-00010.bin",
|
219 |
+
"model.layers.3.post_attention_layernorm.weight": "pytorch_model-00002-of-00010.bin",
|
220 |
+
"model.layers.3.self_attn.k_proj.weight": "pytorch_model-00002-of-00010.bin",
|
221 |
+
"model.layers.3.self_attn.o_proj.weight": "pytorch_model-00002-of-00010.bin",
|
222 |
+
"model.layers.3.self_attn.q_proj.weight": "pytorch_model-00002-of-00010.bin",
|
223 |
+
"model.layers.3.self_attn.v_proj.weight": "pytorch_model-00002-of-00010.bin",
|
224 |
+
"model.layers.30.input_layernorm.weight": "pytorch_model-00008-of-00010.bin",
|
225 |
+
"model.layers.30.mlp.down_proj.weight": "pytorch_model-00008-of-00010.bin",
|
226 |
+
"model.layers.30.mlp.gate_proj.weight": "pytorch_model-00008-of-00010.bin",
|
227 |
+
"model.layers.30.mlp.up_proj.weight": "pytorch_model-00008-of-00010.bin",
|
228 |
+
"model.layers.30.post_attention_layernorm.weight": "pytorch_model-00008-of-00010.bin",
|
229 |
+
"model.layers.30.self_attn.k_proj.weight": "pytorch_model-00008-of-00010.bin",
|
230 |
+
"model.layers.30.self_attn.o_proj.weight": "pytorch_model-00008-of-00010.bin",
|
231 |
+
"model.layers.30.self_attn.q_proj.weight": "pytorch_model-00008-of-00010.bin",
|
232 |
+
"model.layers.30.self_attn.v_proj.weight": "pytorch_model-00008-of-00010.bin",
|
233 |
+
"model.layers.31.input_layernorm.weight": "pytorch_model-00008-of-00010.bin",
|
234 |
+
"model.layers.31.mlp.down_proj.weight": "pytorch_model-00008-of-00010.bin",
|
235 |
+
"model.layers.31.mlp.gate_proj.weight": "pytorch_model-00008-of-00010.bin",
|
236 |
+
"model.layers.31.mlp.up_proj.weight": "pytorch_model-00008-of-00010.bin",
|
237 |
+
"model.layers.31.post_attention_layernorm.weight": "pytorch_model-00008-of-00010.bin",
|
238 |
+
"model.layers.31.self_attn.k_proj.weight": "pytorch_model-00008-of-00010.bin",
|
239 |
+
"model.layers.31.self_attn.o_proj.weight": "pytorch_model-00008-of-00010.bin",
|
240 |
+
"model.layers.31.self_attn.q_proj.weight": "pytorch_model-00008-of-00010.bin",
|
241 |
+
"model.layers.31.self_attn.v_proj.weight": "pytorch_model-00008-of-00010.bin",
|
242 |
+
"model.layers.32.input_layernorm.weight": "pytorch_model-00008-of-00010.bin",
|
243 |
+
"model.layers.32.mlp.down_proj.weight": "pytorch_model-00008-of-00010.bin",
|
244 |
+
"model.layers.32.mlp.gate_proj.weight": "pytorch_model-00008-of-00010.bin",
|
245 |
+
"model.layers.32.mlp.up_proj.weight": "pytorch_model-00008-of-00010.bin",
|
246 |
+
"model.layers.32.post_attention_layernorm.weight": "pytorch_model-00008-of-00010.bin",
|
247 |
+
"model.layers.32.self_attn.k_proj.weight": "pytorch_model-00008-of-00010.bin",
|
248 |
+
"model.layers.32.self_attn.o_proj.weight": "pytorch_model-00008-of-00010.bin",
|
249 |
+
"model.layers.32.self_attn.q_proj.weight": "pytorch_model-00008-of-00010.bin",
|
250 |
+
"model.layers.32.self_attn.v_proj.weight": "pytorch_model-00008-of-00010.bin",
|
251 |
+
"model.layers.33.input_layernorm.weight": "pytorch_model-00009-of-00010.bin",
|
252 |
+
"model.layers.33.mlp.down_proj.weight": "pytorch_model-00009-of-00010.bin",
|
253 |
+
"model.layers.33.mlp.gate_proj.weight": "pytorch_model-00008-of-00010.bin",
|
254 |
+
"model.layers.33.mlp.up_proj.weight": "pytorch_model-00009-of-00010.bin",
|
255 |
+
"model.layers.33.post_attention_layernorm.weight": "pytorch_model-00009-of-00010.bin",
|
256 |
+
"model.layers.33.self_attn.k_proj.weight": "pytorch_model-00008-of-00010.bin",
|
257 |
+
"model.layers.33.self_attn.o_proj.weight": "pytorch_model-00008-of-00010.bin",
|
258 |
+
"model.layers.33.self_attn.q_proj.weight": "pytorch_model-00008-of-00010.bin",
|
259 |
+
"model.layers.33.self_attn.v_proj.weight": "pytorch_model-00008-of-00010.bin",
|
260 |
+
"model.layers.34.input_layernorm.weight": "pytorch_model-00009-of-00010.bin",
|
261 |
+
"model.layers.34.mlp.down_proj.weight": "pytorch_model-00009-of-00010.bin",
|
262 |
+
"model.layers.34.mlp.gate_proj.weight": "pytorch_model-00009-of-00010.bin",
|
263 |
+
"model.layers.34.mlp.up_proj.weight": "pytorch_model-00009-of-00010.bin",
|
264 |
+
"model.layers.34.post_attention_layernorm.weight": "pytorch_model-00009-of-00010.bin",
|
265 |
+
"model.layers.34.self_attn.k_proj.weight": "pytorch_model-00009-of-00010.bin",
|
266 |
+
"model.layers.34.self_attn.o_proj.weight": "pytorch_model-00009-of-00010.bin",
|
267 |
+
"model.layers.34.self_attn.q_proj.weight": "pytorch_model-00009-of-00010.bin",
|
268 |
+
"model.layers.34.self_attn.v_proj.weight": "pytorch_model-00009-of-00010.bin",
|
269 |
+
"model.layers.35.input_layernorm.weight": "pytorch_model-00009-of-00010.bin",
|
270 |
+
"model.layers.35.mlp.down_proj.weight": "pytorch_model-00009-of-00010.bin",
|
271 |
+
"model.layers.35.mlp.gate_proj.weight": "pytorch_model-00009-of-00010.bin",
|
272 |
+
"model.layers.35.mlp.up_proj.weight": "pytorch_model-00009-of-00010.bin",
|
273 |
+
"model.layers.35.post_attention_layernorm.weight": "pytorch_model-00009-of-00010.bin",
|
274 |
+
"model.layers.35.self_attn.k_proj.weight": "pytorch_model-00009-of-00010.bin",
|
275 |
+
"model.layers.35.self_attn.o_proj.weight": "pytorch_model-00009-of-00010.bin",
|
276 |
+
"model.layers.35.self_attn.q_proj.weight": "pytorch_model-00009-of-00010.bin",
|
277 |
+
"model.layers.35.self_attn.v_proj.weight": "pytorch_model-00009-of-00010.bin",
|
278 |
+
"model.layers.36.input_layernorm.weight": "pytorch_model-00009-of-00010.bin",
|
279 |
+
"model.layers.36.mlp.down_proj.weight": "pytorch_model-00009-of-00010.bin",
|
280 |
+
"model.layers.36.mlp.gate_proj.weight": "pytorch_model-00009-of-00010.bin",
|
281 |
+
"model.layers.36.mlp.up_proj.weight": "pytorch_model-00009-of-00010.bin",
|
282 |
+
"model.layers.36.post_attention_layernorm.weight": "pytorch_model-00009-of-00010.bin",
|
283 |
+
"model.layers.36.self_attn.k_proj.weight": "pytorch_model-00009-of-00010.bin",
|
284 |
+
"model.layers.36.self_attn.o_proj.weight": "pytorch_model-00009-of-00010.bin",
|
285 |
+
"model.layers.36.self_attn.q_proj.weight": "pytorch_model-00009-of-00010.bin",
|
286 |
+
"model.layers.36.self_attn.v_proj.weight": "pytorch_model-00009-of-00010.bin",
|
287 |
+
"model.layers.37.input_layernorm.weight": "pytorch_model-00009-of-00010.bin",
|
288 |
+
"model.layers.37.mlp.down_proj.weight": "pytorch_model-00009-of-00010.bin",
|
289 |
+
"model.layers.37.mlp.gate_proj.weight": "pytorch_model-00009-of-00010.bin",
|
290 |
+
"model.layers.37.mlp.up_proj.weight": "pytorch_model-00009-of-00010.bin",
|
291 |
+
"model.layers.37.post_attention_layernorm.weight": "pytorch_model-00009-of-00010.bin",
|
292 |
+
"model.layers.37.self_attn.k_proj.weight": "pytorch_model-00009-of-00010.bin",
|
293 |
+
"model.layers.37.self_attn.o_proj.weight": "pytorch_model-00009-of-00010.bin",
|
294 |
+
"model.layers.37.self_attn.q_proj.weight": "pytorch_model-00009-of-00010.bin",
|
295 |
+
"model.layers.37.self_attn.v_proj.weight": "pytorch_model-00009-of-00010.bin",
|
296 |
+
"model.layers.38.input_layernorm.weight": "pytorch_model-00010-of-00010.bin",
|
297 |
+
"model.layers.38.mlp.down_proj.weight": "pytorch_model-00010-of-00010.bin",
|
298 |
+
"model.layers.38.mlp.gate_proj.weight": "pytorch_model-00010-of-00010.bin",
|
299 |
+
"model.layers.38.mlp.up_proj.weight": "pytorch_model-00010-of-00010.bin",
|
300 |
+
"model.layers.38.post_attention_layernorm.weight": "pytorch_model-00010-of-00010.bin",
|
301 |
+
"model.layers.38.self_attn.k_proj.weight": "pytorch_model-00010-of-00010.bin",
|
302 |
+
"model.layers.38.self_attn.o_proj.weight": "pytorch_model-00010-of-00010.bin",
|
303 |
+
"model.layers.38.self_attn.q_proj.weight": "pytorch_model-00010-of-00010.bin",
|
304 |
+
"model.layers.38.self_attn.v_proj.weight": "pytorch_model-00010-of-00010.bin",
|
305 |
+
"model.layers.39.input_layernorm.weight": "pytorch_model-00010-of-00010.bin",
|
306 |
+
"model.layers.39.mlp.down_proj.weight": "pytorch_model-00010-of-00010.bin",
|
307 |
+
"model.layers.39.mlp.gate_proj.weight": "pytorch_model-00010-of-00010.bin",
|
308 |
+
"model.layers.39.mlp.up_proj.weight": "pytorch_model-00010-of-00010.bin",
|
309 |
+
"model.layers.39.post_attention_layernorm.weight": "pytorch_model-00010-of-00010.bin",
|
310 |
+
"model.layers.39.self_attn.k_proj.weight": "pytorch_model-00010-of-00010.bin",
|
311 |
+
"model.layers.39.self_attn.o_proj.weight": "pytorch_model-00010-of-00010.bin",
|
312 |
+
"model.layers.39.self_attn.q_proj.weight": "pytorch_model-00010-of-00010.bin",
|
313 |
+
"model.layers.39.self_attn.v_proj.weight": "pytorch_model-00010-of-00010.bin",
|
314 |
+
"model.layers.4.input_layernorm.weight": "pytorch_model-00002-of-00010.bin",
|
315 |
+
"model.layers.4.mlp.down_proj.weight": "pytorch_model-00002-of-00010.bin",
|
316 |
+
"model.layers.4.mlp.gate_proj.weight": "pytorch_model-00002-of-00010.bin",
|
317 |
+
"model.layers.4.mlp.up_proj.weight": "pytorch_model-00002-of-00010.bin",
|
318 |
+
"model.layers.4.post_attention_layernorm.weight": "pytorch_model-00002-of-00010.bin",
|
319 |
+
"model.layers.4.self_attn.k_proj.weight": "pytorch_model-00002-of-00010.bin",
|
320 |
+
"model.layers.4.self_attn.o_proj.weight": "pytorch_model-00002-of-00010.bin",
|
321 |
+
"model.layers.4.self_attn.q_proj.weight": "pytorch_model-00002-of-00010.bin",
|
322 |
+
"model.layers.4.self_attn.v_proj.weight": "pytorch_model-00002-of-00010.bin",
|
323 |
+
"model.layers.5.input_layernorm.weight": "pytorch_model-00002-of-00010.bin",
|
324 |
+
"model.layers.5.mlp.down_proj.weight": "pytorch_model-00002-of-00010.bin",
|
325 |
+
"model.layers.5.mlp.gate_proj.weight": "pytorch_model-00002-of-00010.bin",
|
326 |
+
"model.layers.5.mlp.up_proj.weight": "pytorch_model-00002-of-00010.bin",
|
327 |
+
"model.layers.5.post_attention_layernorm.weight": "pytorch_model-00002-of-00010.bin",
|
328 |
+
"model.layers.5.self_attn.k_proj.weight": "pytorch_model-00002-of-00010.bin",
|
329 |
+
"model.layers.5.self_attn.o_proj.weight": "pytorch_model-00002-of-00010.bin",
|
330 |
+
"model.layers.5.self_attn.q_proj.weight": "pytorch_model-00002-of-00010.bin",
|
331 |
+
"model.layers.5.self_attn.v_proj.weight": "pytorch_model-00002-of-00010.bin",
|
332 |
+
"model.layers.6.input_layernorm.weight": "pytorch_model-00003-of-00010.bin",
|
333 |
+
"model.layers.6.mlp.down_proj.weight": "pytorch_model-00003-of-00010.bin",
|
334 |
+
"model.layers.6.mlp.gate_proj.weight": "pytorch_model-00002-of-00010.bin",
|
335 |
+
"model.layers.6.mlp.up_proj.weight": "pytorch_model-00003-of-00010.bin",
|
336 |
+
"model.layers.6.post_attention_layernorm.weight": "pytorch_model-00003-of-00010.bin",
|
337 |
+
"model.layers.6.self_attn.k_proj.weight": "pytorch_model-00002-of-00010.bin",
|
338 |
+
"model.layers.6.self_attn.o_proj.weight": "pytorch_model-00002-of-00010.bin",
|
339 |
+
"model.layers.6.self_attn.q_proj.weight": "pytorch_model-00002-of-00010.bin",
|
340 |
+
"model.layers.6.self_attn.v_proj.weight": "pytorch_model-00002-of-00010.bin",
|
341 |
+
"model.layers.7.input_layernorm.weight": "pytorch_model-00003-of-00010.bin",
|
342 |
+
"model.layers.7.mlp.down_proj.weight": "pytorch_model-00003-of-00010.bin",
|
343 |
+
"model.layers.7.mlp.gate_proj.weight": "pytorch_model-00003-of-00010.bin",
|
344 |
+
"model.layers.7.mlp.up_proj.weight": "pytorch_model-00003-of-00010.bin",
|
345 |
+
"model.layers.7.post_attention_layernorm.weight": "pytorch_model-00003-of-00010.bin",
|
346 |
+
"model.layers.7.self_attn.k_proj.weight": "pytorch_model-00003-of-00010.bin",
|
347 |
+
"model.layers.7.self_attn.o_proj.weight": "pytorch_model-00003-of-00010.bin",
|
348 |
+
"model.layers.7.self_attn.q_proj.weight": "pytorch_model-00003-of-00010.bin",
|
349 |
+
"model.layers.7.self_attn.v_proj.weight": "pytorch_model-00003-of-00010.bin",
|
350 |
+
"model.layers.8.input_layernorm.weight": "pytorch_model-00003-of-00010.bin",
|
351 |
+
"model.layers.8.mlp.down_proj.weight": "pytorch_model-00003-of-00010.bin",
|
352 |
+
"model.layers.8.mlp.gate_proj.weight": "pytorch_model-00003-of-00010.bin",
|
353 |
+
"model.layers.8.mlp.up_proj.weight": "pytorch_model-00003-of-00010.bin",
|
354 |
+
"model.layers.8.post_attention_layernorm.weight": "pytorch_model-00003-of-00010.bin",
|
355 |
+
"model.layers.8.self_attn.k_proj.weight": "pytorch_model-00003-of-00010.bin",
|
356 |
+
"model.layers.8.self_attn.o_proj.weight": "pytorch_model-00003-of-00010.bin",
|
357 |
+
"model.layers.8.self_attn.q_proj.weight": "pytorch_model-00003-of-00010.bin",
|
358 |
+
"model.layers.8.self_attn.v_proj.weight": "pytorch_model-00003-of-00010.bin",
|
359 |
+
"model.layers.9.input_layernorm.weight": "pytorch_model-00003-of-00010.bin",
|
360 |
+
"model.layers.9.mlp.down_proj.weight": "pytorch_model-00003-of-00010.bin",
|
361 |
+
"model.layers.9.mlp.gate_proj.weight": "pytorch_model-00003-of-00010.bin",
|
362 |
+
"model.layers.9.mlp.up_proj.weight": "pytorch_model-00003-of-00010.bin",
|
363 |
+
"model.layers.9.post_attention_layernorm.weight": "pytorch_model-00003-of-00010.bin",
|
364 |
+
"model.layers.9.self_attn.k_proj.weight": "pytorch_model-00003-of-00010.bin",
|
365 |
+
"model.layers.9.self_attn.o_proj.weight": "pytorch_model-00003-of-00010.bin",
|
366 |
+
"model.layers.9.self_attn.q_proj.weight": "pytorch_model-00003-of-00010.bin",
|
367 |
+
"model.layers.9.self_attn.v_proj.weight": "pytorch_model-00003-of-00010.bin",
|
368 |
+
"model.norm.weight": "pytorch_model-00010-of-00010.bin"
|
369 |
+
}
|
370 |
+
}
|
special_tokens_map.json
ADDED
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"bos_token": {
|
3 |
+
"content": "<s>",
|
4 |
+
"lstrip": false,
|
5 |
+
"normalized": false,
|
6 |
+
"rstrip": false,
|
7 |
+
"single_word": false
|
8 |
+
},
|
9 |
+
"eos_token": {
|
10 |
+
"content": "</s>",
|
11 |
+
"lstrip": false,
|
12 |
+
"normalized": false,
|
13 |
+
"rstrip": false,
|
14 |
+
"single_word": false
|
15 |
+
},
|
16 |
+
"pad_token": {
|
17 |
+
"content": "<pad>",
|
18 |
+
"lstrip": false,
|
19 |
+
"normalized": false,
|
20 |
+
"rstrip": false,
|
21 |
+
"single_word": false
|
22 |
+
},
|
23 |
+
"unk_token": {
|
24 |
+
"content": "<unk>",
|
25 |
+
"lstrip": false,
|
26 |
+
"normalized": false,
|
27 |
+
"rstrip": false,
|
28 |
+
"single_word": false
|
29 |
+
}
|
30 |
+
}
|
tokenizer.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b0240ce510f08e6c2041724e9043e33be9d251d1e4a4d94eb68cd47b954b61d2
|
3 |
+
size 17078292
|
tokenizer_config.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|