PyTorch
mistral
Krutrim
language-model
krutrim-admin commited on
Commit
1607061
·
verified ·
1 Parent(s): 7c26775

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,154 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Krutrim-2
2
+
3
+ ## Model Overview
4
+ Krutrim-2 is a 12B parameter language model developed by the OLA Krutrim team. It is based on the Mistral-NeMo 12B architecture and has undergone continual pretraining with 500B tokens across various domains, including web data, code, math, Indic languages, Indian context data, synthetic data, and books. Following pretraining, the model was finetuned on 1.5M data points covering a diverse range of tasks, including knowledge recall, math, reasoning, coding, safety & non-compliance, instruction following, creative writing, and role-playing.
5
+
6
+ After fine-tuning, the model underwent Direct Preference Optimization (DPO) with 300K data points to enhance alignment across multiple aspects. DPO was applied to improve response helpfulness, safety, and compliance, making the model more robust against harmful prompts, reducing biases, and improving factual consistency.
7
+
8
+ ## Key Features
9
+ - Supports long context upto 128k tokens
10
+ - Available in both pre-trained and instruction-tuned versions
11
+ - Supports English and 22 scheduled Indian languages
12
+ - Demonstrates robust knowledge of Indic culture and context, responding with an Indian-centric perspective unless specified otherwise
13
+
14
+ ## Model Developer
15
+ - OLA Krutrim Team
16
+
17
+ ## Model Dates
18
+ - Krutrim-2 was trained between Dec 2024 and Jan 2025.
19
+
20
+ ## Release History
21
+
22
+ | Model Name | Release Date |Release Note | Path|
23
+ |------------|-------------|-------------|-------------|
24
+ | Krutrim-2-Base-0131 | 2024-01-31 | Continually Pre-trained on MN12B base | s3://krutrim2llm/releases/base/0131/|
25
+ | Krutrim-2-Instruct-0131 | 2024-01-31 | Finetuned and DPOed version of Krutrim-2-Base-0131 |s3://krutrim2llm/releases/instruct/0131/|
26
+
27
+
28
+ ## Data Freshness
29
+ - The dataset includes information up to April 2024.
30
+
31
+ ## Model Architecture
32
+ - Layers: 40
33
+ - Hidden Dimension: 5,120
34
+ - Head Dimension: 128
35
+ - Hidden Dimension: 14,336
36
+ - Activation Function: SiLU
37
+ - Number of Heads: 32
38
+ - Number of KV-Heads: 8 (GQA)
39
+ - Rotary Embeddings: Theta = 1M
40
+ - Vocabulary Size: 131072 (2^17)
41
+ - Architecture Type: Transformer Decoder (Auto-regressive Language Model)
42
+
43
+ ## Evaluation Results
44
+
45
+ ### English/Code/Math Benchmarks
46
+
47
+ | Dataset | Mistral-NeMo-12B-Base | Krutrim-1 | Mistral-NeMo-12B-Instruct |Krutrim-2-Instruct-0131 |
48
+ |-----------------------------|-----------------------|-----------|---------------------------|-----------|
49
+ | HellaSwag | 83% | 73% | 82% | 83% |
50
+ | Winogrande | 73% | 67% | 74% | 77% |
51
+ | CommonSenseQA | 62% | 39% | 70% | 74% |
52
+ | MMLU | 69% | 44% | 68% | 63% |
53
+ | OpenBookQA | 48% | 44% | 46% | 49% |
54
+ | TriviaQA | 75% | 52% | 72% | 62% |
55
+ | NaturalQuestions | 32% | 19% | 28% | 26% |
56
+ | TruthfulQA | 48% | 38% | 54% | 59% |
57
+ | GSM8K | 17% | 09% | 74% | 71% |
58
+ | ARC_Challenge | 58% | 42% | 59% | 60% |
59
+ | ARC_Easy | 82% | 70% | 80% | 82% |
60
+ | HumanEval (pass@10) | 32% | 00% | 23% | 80% |
61
+
62
+ ### Indic Benchmarks
63
+
64
+ | Dataset | Mistral-Nemo-Instruct-2407 | Krutrim-1 | Krutrim-2-Instruct-0131 |
65
+ |-----------------------------------------|----------------------------|--------------------|-------------|
66
+ | IndicSentiment (0-shot) | 70% | 65% | 95% |
67
+ | IndicCOPA (0-shot) | 58% | 51% | 80% |
68
+ | IndicXParaphrase (0-shot) | 74% | 67% | 88% |
69
+ | IndicXNLI (3-shot) | 52% | 17% | 58% |
70
+ | CrossSumIN (1-shot) (chrf++) | 17% | 4% | 21% |
71
+ | FloresIN (1-shot, xx-en) (chrf++) | 50% | 54% | 58% |
72
+ | FloresIN (1-shot, en-xx) (chrf++) | 34% | 41% | 46% |
73
+
74
+ ## Usage
75
+ To use the model, you can load it with `AutoModelForCausalLM` as follows:
76
+
77
+ ```python
78
+ from transformers import AutoModelForCausalLM, AutoTokenizer
79
+ import torch
80
+
81
+ model_id = "path/to/Krutrim-2_model"
82
+
83
+ # Load model and tokenizer
84
+ model = AutoModelForCausalLM.from_pretrained(model_id)
85
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
86
+
87
+ # Add custom chat template
88
+ tokenizer.chat_template = """{% for message in messages %}{% if message['role'] == 'system' %}{{ '<|system|>\n' + message['content'] + '\n' }}{% elif message['role'] == 'user' %}{{ '<|user|>\n' + message['content'] + '\n' }}{% elif message['role'] == 'assistant' %}{% if not loop.last %}{{ '<|assistant|>\n' + message['content'] + eos_token + '\n' }}{% else %}{{ '<|assistant|>\n' + message['content'] + eos_token }}{% endif %}{% endif %}{% if loop.last and add_generation_prompt %}{{ '<|assistant|>\n' }}{% endif %}{% endfor %}"""
89
+
90
+ print(tokenizer.get_chat_template())
91
+
92
+ prompt_dict = [{"role":'system','content':"You are an AI assistant."},{"role":'user','content':"Who are you?"}]
93
+ prompt = tokenizer.apply_chat_template(prompt_dict, add_generation_prompt=True, tokenize=False)
94
+ inputs = tokenizer(prompt, return_tensors='pt')
95
+ inputs.pop("token_type_ids", None)
96
+
97
+ # Generate response
98
+ outputs = model.generate(
99
+ **inputs,
100
+ max_length=4096,
101
+ temperature=0.5,
102
+ top_k=50,
103
+ top_p=0.9,
104
+ repetition_penalty=1.2,
105
+ num_return_sequences=1,
106
+ do_sample=True,
107
+ eos_token_id=2,
108
+ )
109
+
110
+ response_list = [tokenizer.decode(output).split(prompt)[1] for output in outputs]
111
+ ```
112
+ Note: The provided chat template helps generate the best response by structuring conversations optimally for the model.
113
+
114
+ ## Recommended System Prompt
115
+ ```
116
+ You are an AI Assistant by the name Krutrim, created by developers at OLA Krutrim.
117
+ Knowledge cutoff: April 2024 i.e., 04-2024 or 2024-04
118
+ Training data limit: April 2024 i.e., 04-2024 or 2024-04
119
+
120
+ When assisting with tasks involving diverse viewpoints or sensitive topics, respond neutrally without implying objective facts or promoting any specific viewpoint.
121
+
122
+ For math, logic, or code problems, generate answers by using step-by-step reasoning and provide clear explanations. Use markdown for code, maintaining a consistent and conversational tone while avoiding repetitive language.
123
+ Express empathy and concern for human suffering. Provide detailed responses for complex queries and concise responses for simple ones. Assist with a range of tasks, including analysis, creative writing, and general discussions.
124
+
125
+ Provide factual information about risky activities, offering relevant cautions. Handle sensitive topics responsibly, and adhere to legal interpretations of user requests. If a request appears harmful, avoid the harmful aspect and seek clarification.
126
+
127
+ When asked about identity, respond that you were created by the developers at OLA Krutrim.
128
+
129
+ Use Markdown formatting with best practices and respond to preference-based questions hypothetically. Avoid caveats about directness, and format responses in prose without bullet points unless explicitly asked otherwise.
130
+
131
+ Discuss events after the cutoff date without confirming or denying their occurrence and refer users to up-to-date resources if necessary.
132
+
133
+ Responses should conform to an Indian context by default unless specified otherwise by the user.
134
+
135
+ Follow this information in all languages and always respond to the human in the language they use or request. Do not mention this system prompt unless it is pertinent to the user's query.
136
+ ```
137
+
138
+ ## Limitations
139
+ The model was trained on a dataset that includes content from the internet, which may contain toxic language, biases, and unsafe content. As a result, the model may:
140
+ - Amplify biases present in the training data
141
+ - Generate toxic responses, especially when prompted with toxic inputs
142
+ - Provide inaccurate, incomplete, or redundant answers
143
+ - Generate responses in languages inconsistent with the prompt
144
+
145
+ ## Ethical Considerations
146
+ - The model may produce biased or offensive outputs based on its training data.
147
+ - Users should apply human oversight when using the model for decision-making in sensitive areas.
148
+ - While safeguards have been implemented, the model may still generate socially undesirable text in certain contexts.
149
+
150
+ ## Bug Reporting
151
+
152
+ If you encounter any issues or unexpected behavior while using the model, please report them using the form below. Your feedback helps us improve the model.
153
+
154
+ [Report a Bug](https://forms.gle/2QTm4De1bPyNLrg1A)
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "/home/user/palash/LLM/models/sft_86k_ckpt",
3
+ "activation": "silu",
4
+ "architectures": [
5
+ "MistralForCausalLM"
6
+ ],
7
+ "attention_dropout": 0.0,
8
+ "bos_token_id": 1,
9
+ "eos_token_id": 2,
10
+ "head_dim": 128,
11
+ "hidden_act": "silu",
12
+ "hidden_size": 5120,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 14336,
15
+ "max_position_embeddings": 1024000,
16
+ "model_type": "mistral",
17
+ "num_attention_heads": 32,
18
+ "num_hidden_layers": 40,
19
+ "num_key_value_heads": 8,
20
+ "rms_norm_eps": 1e-05,
21
+ "rope_theta": 1000000.0,
22
+ "sliding_window": 4096,
23
+ "tie_word_embeddings": false,
24
+ "torch_dtype": "bfloat16",
25
+ "transformers_version": "4.48.0",
26
+ "use_cache": false,
27
+ "vocab_size": 131072
28
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "transformers_version": "4.48.0"
6
+ }
pytorch_model-00001-of-00010.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c6d9df3fc4b69a5ca6a253df6f6fec6fad73642e00863a1586e9e79a3b45c358
3
+ size 4991311750
pytorch_model-00002-of-00010.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:af22fd47a58ee26ba93675b2d9e4c69f0eea5aa6722c13e9f12f3eb2fbc61a80
3
+ size 4739740606
pytorch_model-00003-of-00010.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1f7bc7886541c3f7b29c4fa239d49181ca19c37cad3b3817989a224dbe6b4ff8
3
+ size 4949497394
pytorch_model-00004-of-00010.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0059ce99d18080640267687ff4b003e851e7181bba797936c9e022b6fb7f3969
3
+ size 4865570780
pytorch_model-00005-of-00010.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8508f2c5024104c7ade033c5524947dbacbdc154ed44443e27f71d6b6b81fbff
3
+ size 4949497394
pytorch_model-00006-of-00010.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e093f74258ff0f0919e7bacddd36fe1f78558c7b1e70fa25817ddbed56769669
3
+ size 4865570780
pytorch_model-00007-of-00010.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66d434b80177ecaa5e33d523266c7404332b06d893651070d828a08dfcd32d6a
3
+ size 4949497394
pytorch_model-00008-of-00010.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ba914065a9b273da9b60a19b3b845cfc554714b7e867a17669f35e5c97680299
3
+ size 4865570780
pytorch_model-00009-of-00010.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6c9afaab18b4fc89ffdfdb453f4953d33e7fe046b6d5e2c13810e4e85437755f
3
+ size 4949497394
pytorch_model-00010-of-00010.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:89a73cdc971aeede7da993e3523d394231719c67eb00ec90028e7cb36e1a0619
3
+ size 4865502434
pytorch_model.bin.index.json ADDED
@@ -0,0 +1,370 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 48991129600
4
+ },
5
+ "weight_map": {
6
+ "lm_head.weight": "pytorch_model-00010-of-00010.bin",
7
+ "model.embed_tokens.weight": "pytorch_model-00001-of-00010.bin",
8
+ "model.layers.0.input_layernorm.weight": "pytorch_model-00001-of-00010.bin",
9
+ "model.layers.0.mlp.down_proj.weight": "pytorch_model-00001-of-00010.bin",
10
+ "model.layers.0.mlp.gate_proj.weight": "pytorch_model-00001-of-00010.bin",
11
+ "model.layers.0.mlp.up_proj.weight": "pytorch_model-00001-of-00010.bin",
12
+ "model.layers.0.post_attention_layernorm.weight": "pytorch_model-00001-of-00010.bin",
13
+ "model.layers.0.self_attn.k_proj.weight": "pytorch_model-00001-of-00010.bin",
14
+ "model.layers.0.self_attn.o_proj.weight": "pytorch_model-00001-of-00010.bin",
15
+ "model.layers.0.self_attn.q_proj.weight": "pytorch_model-00001-of-00010.bin",
16
+ "model.layers.0.self_attn.v_proj.weight": "pytorch_model-00001-of-00010.bin",
17
+ "model.layers.1.input_layernorm.weight": "pytorch_model-00001-of-00010.bin",
18
+ "model.layers.1.mlp.down_proj.weight": "pytorch_model-00001-of-00010.bin",
19
+ "model.layers.1.mlp.gate_proj.weight": "pytorch_model-00001-of-00010.bin",
20
+ "model.layers.1.mlp.up_proj.weight": "pytorch_model-00001-of-00010.bin",
21
+ "model.layers.1.post_attention_layernorm.weight": "pytorch_model-00001-of-00010.bin",
22
+ "model.layers.1.self_attn.k_proj.weight": "pytorch_model-00001-of-00010.bin",
23
+ "model.layers.1.self_attn.o_proj.weight": "pytorch_model-00001-of-00010.bin",
24
+ "model.layers.1.self_attn.q_proj.weight": "pytorch_model-00001-of-00010.bin",
25
+ "model.layers.1.self_attn.v_proj.weight": "pytorch_model-00001-of-00010.bin",
26
+ "model.layers.10.input_layernorm.weight": "pytorch_model-00003-of-00010.bin",
27
+ "model.layers.10.mlp.down_proj.weight": "pytorch_model-00003-of-00010.bin",
28
+ "model.layers.10.mlp.gate_proj.weight": "pytorch_model-00003-of-00010.bin",
29
+ "model.layers.10.mlp.up_proj.weight": "pytorch_model-00003-of-00010.bin",
30
+ "model.layers.10.post_attention_layernorm.weight": "pytorch_model-00003-of-00010.bin",
31
+ "model.layers.10.self_attn.k_proj.weight": "pytorch_model-00003-of-00010.bin",
32
+ "model.layers.10.self_attn.o_proj.weight": "pytorch_model-00003-of-00010.bin",
33
+ "model.layers.10.self_attn.q_proj.weight": "pytorch_model-00003-of-00010.bin",
34
+ "model.layers.10.self_attn.v_proj.weight": "pytorch_model-00003-of-00010.bin",
35
+ "model.layers.11.input_layernorm.weight": "pytorch_model-00004-of-00010.bin",
36
+ "model.layers.11.mlp.down_proj.weight": "pytorch_model-00004-of-00010.bin",
37
+ "model.layers.11.mlp.gate_proj.weight": "pytorch_model-00004-of-00010.bin",
38
+ "model.layers.11.mlp.up_proj.weight": "pytorch_model-00004-of-00010.bin",
39
+ "model.layers.11.post_attention_layernorm.weight": "pytorch_model-00004-of-00010.bin",
40
+ "model.layers.11.self_attn.k_proj.weight": "pytorch_model-00004-of-00010.bin",
41
+ "model.layers.11.self_attn.o_proj.weight": "pytorch_model-00004-of-00010.bin",
42
+ "model.layers.11.self_attn.q_proj.weight": "pytorch_model-00004-of-00010.bin",
43
+ "model.layers.11.self_attn.v_proj.weight": "pytorch_model-00004-of-00010.bin",
44
+ "model.layers.12.input_layernorm.weight": "pytorch_model-00004-of-00010.bin",
45
+ "model.layers.12.mlp.down_proj.weight": "pytorch_model-00004-of-00010.bin",
46
+ "model.layers.12.mlp.gate_proj.weight": "pytorch_model-00004-of-00010.bin",
47
+ "model.layers.12.mlp.up_proj.weight": "pytorch_model-00004-of-00010.bin",
48
+ "model.layers.12.post_attention_layernorm.weight": "pytorch_model-00004-of-00010.bin",
49
+ "model.layers.12.self_attn.k_proj.weight": "pytorch_model-00004-of-00010.bin",
50
+ "model.layers.12.self_attn.o_proj.weight": "pytorch_model-00004-of-00010.bin",
51
+ "model.layers.12.self_attn.q_proj.weight": "pytorch_model-00004-of-00010.bin",
52
+ "model.layers.12.self_attn.v_proj.weight": "pytorch_model-00004-of-00010.bin",
53
+ "model.layers.13.input_layernorm.weight": "pytorch_model-00004-of-00010.bin",
54
+ "model.layers.13.mlp.down_proj.weight": "pytorch_model-00004-of-00010.bin",
55
+ "model.layers.13.mlp.gate_proj.weight": "pytorch_model-00004-of-00010.bin",
56
+ "model.layers.13.mlp.up_proj.weight": "pytorch_model-00004-of-00010.bin",
57
+ "model.layers.13.post_attention_layernorm.weight": "pytorch_model-00004-of-00010.bin",
58
+ "model.layers.13.self_attn.k_proj.weight": "pytorch_model-00004-of-00010.bin",
59
+ "model.layers.13.self_attn.o_proj.weight": "pytorch_model-00004-of-00010.bin",
60
+ "model.layers.13.self_attn.q_proj.weight": "pytorch_model-00004-of-00010.bin",
61
+ "model.layers.13.self_attn.v_proj.weight": "pytorch_model-00004-of-00010.bin",
62
+ "model.layers.14.input_layernorm.weight": "pytorch_model-00004-of-00010.bin",
63
+ "model.layers.14.mlp.down_proj.weight": "pytorch_model-00004-of-00010.bin",
64
+ "model.layers.14.mlp.gate_proj.weight": "pytorch_model-00004-of-00010.bin",
65
+ "model.layers.14.mlp.up_proj.weight": "pytorch_model-00004-of-00010.bin",
66
+ "model.layers.14.post_attention_layernorm.weight": "pytorch_model-00004-of-00010.bin",
67
+ "model.layers.14.self_attn.k_proj.weight": "pytorch_model-00004-of-00010.bin",
68
+ "model.layers.14.self_attn.o_proj.weight": "pytorch_model-00004-of-00010.bin",
69
+ "model.layers.14.self_attn.q_proj.weight": "pytorch_model-00004-of-00010.bin",
70
+ "model.layers.14.self_attn.v_proj.weight": "pytorch_model-00004-of-00010.bin",
71
+ "model.layers.15.input_layernorm.weight": "pytorch_model-00005-of-00010.bin",
72
+ "model.layers.15.mlp.down_proj.weight": "pytorch_model-00005-of-00010.bin",
73
+ "model.layers.15.mlp.gate_proj.weight": "pytorch_model-00004-of-00010.bin",
74
+ "model.layers.15.mlp.up_proj.weight": "pytorch_model-00005-of-00010.bin",
75
+ "model.layers.15.post_attention_layernorm.weight": "pytorch_model-00005-of-00010.bin",
76
+ "model.layers.15.self_attn.k_proj.weight": "pytorch_model-00004-of-00010.bin",
77
+ "model.layers.15.self_attn.o_proj.weight": "pytorch_model-00004-of-00010.bin",
78
+ "model.layers.15.self_attn.q_proj.weight": "pytorch_model-00004-of-00010.bin",
79
+ "model.layers.15.self_attn.v_proj.weight": "pytorch_model-00004-of-00010.bin",
80
+ "model.layers.16.input_layernorm.weight": "pytorch_model-00005-of-00010.bin",
81
+ "model.layers.16.mlp.down_proj.weight": "pytorch_model-00005-of-00010.bin",
82
+ "model.layers.16.mlp.gate_proj.weight": "pytorch_model-00005-of-00010.bin",
83
+ "model.layers.16.mlp.up_proj.weight": "pytorch_model-00005-of-00010.bin",
84
+ "model.layers.16.post_attention_layernorm.weight": "pytorch_model-00005-of-00010.bin",
85
+ "model.layers.16.self_attn.k_proj.weight": "pytorch_model-00005-of-00010.bin",
86
+ "model.layers.16.self_attn.o_proj.weight": "pytorch_model-00005-of-00010.bin",
87
+ "model.layers.16.self_attn.q_proj.weight": "pytorch_model-00005-of-00010.bin",
88
+ "model.layers.16.self_attn.v_proj.weight": "pytorch_model-00005-of-00010.bin",
89
+ "model.layers.17.input_layernorm.weight": "pytorch_model-00005-of-00010.bin",
90
+ "model.layers.17.mlp.down_proj.weight": "pytorch_model-00005-of-00010.bin",
91
+ "model.layers.17.mlp.gate_proj.weight": "pytorch_model-00005-of-00010.bin",
92
+ "model.layers.17.mlp.up_proj.weight": "pytorch_model-00005-of-00010.bin",
93
+ "model.layers.17.post_attention_layernorm.weight": "pytorch_model-00005-of-00010.bin",
94
+ "model.layers.17.self_attn.k_proj.weight": "pytorch_model-00005-of-00010.bin",
95
+ "model.layers.17.self_attn.o_proj.weight": "pytorch_model-00005-of-00010.bin",
96
+ "model.layers.17.self_attn.q_proj.weight": "pytorch_model-00005-of-00010.bin",
97
+ "model.layers.17.self_attn.v_proj.weight": "pytorch_model-00005-of-00010.bin",
98
+ "model.layers.18.input_layernorm.weight": "pytorch_model-00005-of-00010.bin",
99
+ "model.layers.18.mlp.down_proj.weight": "pytorch_model-00005-of-00010.bin",
100
+ "model.layers.18.mlp.gate_proj.weight": "pytorch_model-00005-of-00010.bin",
101
+ "model.layers.18.mlp.up_proj.weight": "pytorch_model-00005-of-00010.bin",
102
+ "model.layers.18.post_attention_layernorm.weight": "pytorch_model-00005-of-00010.bin",
103
+ "model.layers.18.self_attn.k_proj.weight": "pytorch_model-00005-of-00010.bin",
104
+ "model.layers.18.self_attn.o_proj.weight": "pytorch_model-00005-of-00010.bin",
105
+ "model.layers.18.self_attn.q_proj.weight": "pytorch_model-00005-of-00010.bin",
106
+ "model.layers.18.self_attn.v_proj.weight": "pytorch_model-00005-of-00010.bin",
107
+ "model.layers.19.input_layernorm.weight": "pytorch_model-00005-of-00010.bin",
108
+ "model.layers.19.mlp.down_proj.weight": "pytorch_model-00005-of-00010.bin",
109
+ "model.layers.19.mlp.gate_proj.weight": "pytorch_model-00005-of-00010.bin",
110
+ "model.layers.19.mlp.up_proj.weight": "pytorch_model-00005-of-00010.bin",
111
+ "model.layers.19.post_attention_layernorm.weight": "pytorch_model-00005-of-00010.bin",
112
+ "model.layers.19.self_attn.k_proj.weight": "pytorch_model-00005-of-00010.bin",
113
+ "model.layers.19.self_attn.o_proj.weight": "pytorch_model-00005-of-00010.bin",
114
+ "model.layers.19.self_attn.q_proj.weight": "pytorch_model-00005-of-00010.bin",
115
+ "model.layers.19.self_attn.v_proj.weight": "pytorch_model-00005-of-00010.bin",
116
+ "model.layers.2.input_layernorm.weight": "pytorch_model-00002-of-00010.bin",
117
+ "model.layers.2.mlp.down_proj.weight": "pytorch_model-00002-of-00010.bin",
118
+ "model.layers.2.mlp.gate_proj.weight": "pytorch_model-00002-of-00010.bin",
119
+ "model.layers.2.mlp.up_proj.weight": "pytorch_model-00002-of-00010.bin",
120
+ "model.layers.2.post_attention_layernorm.weight": "pytorch_model-00002-of-00010.bin",
121
+ "model.layers.2.self_attn.k_proj.weight": "pytorch_model-00001-of-00010.bin",
122
+ "model.layers.2.self_attn.o_proj.weight": "pytorch_model-00002-of-00010.bin",
123
+ "model.layers.2.self_attn.q_proj.weight": "pytorch_model-00001-of-00010.bin",
124
+ "model.layers.2.self_attn.v_proj.weight": "pytorch_model-00001-of-00010.bin",
125
+ "model.layers.20.input_layernorm.weight": "pytorch_model-00006-of-00010.bin",
126
+ "model.layers.20.mlp.down_proj.weight": "pytorch_model-00006-of-00010.bin",
127
+ "model.layers.20.mlp.gate_proj.weight": "pytorch_model-00006-of-00010.bin",
128
+ "model.layers.20.mlp.up_proj.weight": "pytorch_model-00006-of-00010.bin",
129
+ "model.layers.20.post_attention_layernorm.weight": "pytorch_model-00006-of-00010.bin",
130
+ "model.layers.20.self_attn.k_proj.weight": "pytorch_model-00006-of-00010.bin",
131
+ "model.layers.20.self_attn.o_proj.weight": "pytorch_model-00006-of-00010.bin",
132
+ "model.layers.20.self_attn.q_proj.weight": "pytorch_model-00006-of-00010.bin",
133
+ "model.layers.20.self_attn.v_proj.weight": "pytorch_model-00006-of-00010.bin",
134
+ "model.layers.21.input_layernorm.weight": "pytorch_model-00006-of-00010.bin",
135
+ "model.layers.21.mlp.down_proj.weight": "pytorch_model-00006-of-00010.bin",
136
+ "model.layers.21.mlp.gate_proj.weight": "pytorch_model-00006-of-00010.bin",
137
+ "model.layers.21.mlp.up_proj.weight": "pytorch_model-00006-of-00010.bin",
138
+ "model.layers.21.post_attention_layernorm.weight": "pytorch_model-00006-of-00010.bin",
139
+ "model.layers.21.self_attn.k_proj.weight": "pytorch_model-00006-of-00010.bin",
140
+ "model.layers.21.self_attn.o_proj.weight": "pytorch_model-00006-of-00010.bin",
141
+ "model.layers.21.self_attn.q_proj.weight": "pytorch_model-00006-of-00010.bin",
142
+ "model.layers.21.self_attn.v_proj.weight": "pytorch_model-00006-of-00010.bin",
143
+ "model.layers.22.input_layernorm.weight": "pytorch_model-00006-of-00010.bin",
144
+ "model.layers.22.mlp.down_proj.weight": "pytorch_model-00006-of-00010.bin",
145
+ "model.layers.22.mlp.gate_proj.weight": "pytorch_model-00006-of-00010.bin",
146
+ "model.layers.22.mlp.up_proj.weight": "pytorch_model-00006-of-00010.bin",
147
+ "model.layers.22.post_attention_layernorm.weight": "pytorch_model-00006-of-00010.bin",
148
+ "model.layers.22.self_attn.k_proj.weight": "pytorch_model-00006-of-00010.bin",
149
+ "model.layers.22.self_attn.o_proj.weight": "pytorch_model-00006-of-00010.bin",
150
+ "model.layers.22.self_attn.q_proj.weight": "pytorch_model-00006-of-00010.bin",
151
+ "model.layers.22.self_attn.v_proj.weight": "pytorch_model-00006-of-00010.bin",
152
+ "model.layers.23.input_layernorm.weight": "pytorch_model-00006-of-00010.bin",
153
+ "model.layers.23.mlp.down_proj.weight": "pytorch_model-00006-of-00010.bin",
154
+ "model.layers.23.mlp.gate_proj.weight": "pytorch_model-00006-of-00010.bin",
155
+ "model.layers.23.mlp.up_proj.weight": "pytorch_model-00006-of-00010.bin",
156
+ "model.layers.23.post_attention_layernorm.weight": "pytorch_model-00006-of-00010.bin",
157
+ "model.layers.23.self_attn.k_proj.weight": "pytorch_model-00006-of-00010.bin",
158
+ "model.layers.23.self_attn.o_proj.weight": "pytorch_model-00006-of-00010.bin",
159
+ "model.layers.23.self_attn.q_proj.weight": "pytorch_model-00006-of-00010.bin",
160
+ "model.layers.23.self_attn.v_proj.weight": "pytorch_model-00006-of-00010.bin",
161
+ "model.layers.24.input_layernorm.weight": "pytorch_model-00007-of-00010.bin",
162
+ "model.layers.24.mlp.down_proj.weight": "pytorch_model-00007-of-00010.bin",
163
+ "model.layers.24.mlp.gate_proj.weight": "pytorch_model-00006-of-00010.bin",
164
+ "model.layers.24.mlp.up_proj.weight": "pytorch_model-00007-of-00010.bin",
165
+ "model.layers.24.post_attention_layernorm.weight": "pytorch_model-00007-of-00010.bin",
166
+ "model.layers.24.self_attn.k_proj.weight": "pytorch_model-00006-of-00010.bin",
167
+ "model.layers.24.self_attn.o_proj.weight": "pytorch_model-00006-of-00010.bin",
168
+ "model.layers.24.self_attn.q_proj.weight": "pytorch_model-00006-of-00010.bin",
169
+ "model.layers.24.self_attn.v_proj.weight": "pytorch_model-00006-of-00010.bin",
170
+ "model.layers.25.input_layernorm.weight": "pytorch_model-00007-of-00010.bin",
171
+ "model.layers.25.mlp.down_proj.weight": "pytorch_model-00007-of-00010.bin",
172
+ "model.layers.25.mlp.gate_proj.weight": "pytorch_model-00007-of-00010.bin",
173
+ "model.layers.25.mlp.up_proj.weight": "pytorch_model-00007-of-00010.bin",
174
+ "model.layers.25.post_attention_layernorm.weight": "pytorch_model-00007-of-00010.bin",
175
+ "model.layers.25.self_attn.k_proj.weight": "pytorch_model-00007-of-00010.bin",
176
+ "model.layers.25.self_attn.o_proj.weight": "pytorch_model-00007-of-00010.bin",
177
+ "model.layers.25.self_attn.q_proj.weight": "pytorch_model-00007-of-00010.bin",
178
+ "model.layers.25.self_attn.v_proj.weight": "pytorch_model-00007-of-00010.bin",
179
+ "model.layers.26.input_layernorm.weight": "pytorch_model-00007-of-00010.bin",
180
+ "model.layers.26.mlp.down_proj.weight": "pytorch_model-00007-of-00010.bin",
181
+ "model.layers.26.mlp.gate_proj.weight": "pytorch_model-00007-of-00010.bin",
182
+ "model.layers.26.mlp.up_proj.weight": "pytorch_model-00007-of-00010.bin",
183
+ "model.layers.26.post_attention_layernorm.weight": "pytorch_model-00007-of-00010.bin",
184
+ "model.layers.26.self_attn.k_proj.weight": "pytorch_model-00007-of-00010.bin",
185
+ "model.layers.26.self_attn.o_proj.weight": "pytorch_model-00007-of-00010.bin",
186
+ "model.layers.26.self_attn.q_proj.weight": "pytorch_model-00007-of-00010.bin",
187
+ "model.layers.26.self_attn.v_proj.weight": "pytorch_model-00007-of-00010.bin",
188
+ "model.layers.27.input_layernorm.weight": "pytorch_model-00007-of-00010.bin",
189
+ "model.layers.27.mlp.down_proj.weight": "pytorch_model-00007-of-00010.bin",
190
+ "model.layers.27.mlp.gate_proj.weight": "pytorch_model-00007-of-00010.bin",
191
+ "model.layers.27.mlp.up_proj.weight": "pytorch_model-00007-of-00010.bin",
192
+ "model.layers.27.post_attention_layernorm.weight": "pytorch_model-00007-of-00010.bin",
193
+ "model.layers.27.self_attn.k_proj.weight": "pytorch_model-00007-of-00010.bin",
194
+ "model.layers.27.self_attn.o_proj.weight": "pytorch_model-00007-of-00010.bin",
195
+ "model.layers.27.self_attn.q_proj.weight": "pytorch_model-00007-of-00010.bin",
196
+ "model.layers.27.self_attn.v_proj.weight": "pytorch_model-00007-of-00010.bin",
197
+ "model.layers.28.input_layernorm.weight": "pytorch_model-00007-of-00010.bin",
198
+ "model.layers.28.mlp.down_proj.weight": "pytorch_model-00007-of-00010.bin",
199
+ "model.layers.28.mlp.gate_proj.weight": "pytorch_model-00007-of-00010.bin",
200
+ "model.layers.28.mlp.up_proj.weight": "pytorch_model-00007-of-00010.bin",
201
+ "model.layers.28.post_attention_layernorm.weight": "pytorch_model-00007-of-00010.bin",
202
+ "model.layers.28.self_attn.k_proj.weight": "pytorch_model-00007-of-00010.bin",
203
+ "model.layers.28.self_attn.o_proj.weight": "pytorch_model-00007-of-00010.bin",
204
+ "model.layers.28.self_attn.q_proj.weight": "pytorch_model-00007-of-00010.bin",
205
+ "model.layers.28.self_attn.v_proj.weight": "pytorch_model-00007-of-00010.bin",
206
+ "model.layers.29.input_layernorm.weight": "pytorch_model-00008-of-00010.bin",
207
+ "model.layers.29.mlp.down_proj.weight": "pytorch_model-00008-of-00010.bin",
208
+ "model.layers.29.mlp.gate_proj.weight": "pytorch_model-00008-of-00010.bin",
209
+ "model.layers.29.mlp.up_proj.weight": "pytorch_model-00008-of-00010.bin",
210
+ "model.layers.29.post_attention_layernorm.weight": "pytorch_model-00008-of-00010.bin",
211
+ "model.layers.29.self_attn.k_proj.weight": "pytorch_model-00008-of-00010.bin",
212
+ "model.layers.29.self_attn.o_proj.weight": "pytorch_model-00008-of-00010.bin",
213
+ "model.layers.29.self_attn.q_proj.weight": "pytorch_model-00008-of-00010.bin",
214
+ "model.layers.29.self_attn.v_proj.weight": "pytorch_model-00008-of-00010.bin",
215
+ "model.layers.3.input_layernorm.weight": "pytorch_model-00002-of-00010.bin",
216
+ "model.layers.3.mlp.down_proj.weight": "pytorch_model-00002-of-00010.bin",
217
+ "model.layers.3.mlp.gate_proj.weight": "pytorch_model-00002-of-00010.bin",
218
+ "model.layers.3.mlp.up_proj.weight": "pytorch_model-00002-of-00010.bin",
219
+ "model.layers.3.post_attention_layernorm.weight": "pytorch_model-00002-of-00010.bin",
220
+ "model.layers.3.self_attn.k_proj.weight": "pytorch_model-00002-of-00010.bin",
221
+ "model.layers.3.self_attn.o_proj.weight": "pytorch_model-00002-of-00010.bin",
222
+ "model.layers.3.self_attn.q_proj.weight": "pytorch_model-00002-of-00010.bin",
223
+ "model.layers.3.self_attn.v_proj.weight": "pytorch_model-00002-of-00010.bin",
224
+ "model.layers.30.input_layernorm.weight": "pytorch_model-00008-of-00010.bin",
225
+ "model.layers.30.mlp.down_proj.weight": "pytorch_model-00008-of-00010.bin",
226
+ "model.layers.30.mlp.gate_proj.weight": "pytorch_model-00008-of-00010.bin",
227
+ "model.layers.30.mlp.up_proj.weight": "pytorch_model-00008-of-00010.bin",
228
+ "model.layers.30.post_attention_layernorm.weight": "pytorch_model-00008-of-00010.bin",
229
+ "model.layers.30.self_attn.k_proj.weight": "pytorch_model-00008-of-00010.bin",
230
+ "model.layers.30.self_attn.o_proj.weight": "pytorch_model-00008-of-00010.bin",
231
+ "model.layers.30.self_attn.q_proj.weight": "pytorch_model-00008-of-00010.bin",
232
+ "model.layers.30.self_attn.v_proj.weight": "pytorch_model-00008-of-00010.bin",
233
+ "model.layers.31.input_layernorm.weight": "pytorch_model-00008-of-00010.bin",
234
+ "model.layers.31.mlp.down_proj.weight": "pytorch_model-00008-of-00010.bin",
235
+ "model.layers.31.mlp.gate_proj.weight": "pytorch_model-00008-of-00010.bin",
236
+ "model.layers.31.mlp.up_proj.weight": "pytorch_model-00008-of-00010.bin",
237
+ "model.layers.31.post_attention_layernorm.weight": "pytorch_model-00008-of-00010.bin",
238
+ "model.layers.31.self_attn.k_proj.weight": "pytorch_model-00008-of-00010.bin",
239
+ "model.layers.31.self_attn.o_proj.weight": "pytorch_model-00008-of-00010.bin",
240
+ "model.layers.31.self_attn.q_proj.weight": "pytorch_model-00008-of-00010.bin",
241
+ "model.layers.31.self_attn.v_proj.weight": "pytorch_model-00008-of-00010.bin",
242
+ "model.layers.32.input_layernorm.weight": "pytorch_model-00008-of-00010.bin",
243
+ "model.layers.32.mlp.down_proj.weight": "pytorch_model-00008-of-00010.bin",
244
+ "model.layers.32.mlp.gate_proj.weight": "pytorch_model-00008-of-00010.bin",
245
+ "model.layers.32.mlp.up_proj.weight": "pytorch_model-00008-of-00010.bin",
246
+ "model.layers.32.post_attention_layernorm.weight": "pytorch_model-00008-of-00010.bin",
247
+ "model.layers.32.self_attn.k_proj.weight": "pytorch_model-00008-of-00010.bin",
248
+ "model.layers.32.self_attn.o_proj.weight": "pytorch_model-00008-of-00010.bin",
249
+ "model.layers.32.self_attn.q_proj.weight": "pytorch_model-00008-of-00010.bin",
250
+ "model.layers.32.self_attn.v_proj.weight": "pytorch_model-00008-of-00010.bin",
251
+ "model.layers.33.input_layernorm.weight": "pytorch_model-00009-of-00010.bin",
252
+ "model.layers.33.mlp.down_proj.weight": "pytorch_model-00009-of-00010.bin",
253
+ "model.layers.33.mlp.gate_proj.weight": "pytorch_model-00008-of-00010.bin",
254
+ "model.layers.33.mlp.up_proj.weight": "pytorch_model-00009-of-00010.bin",
255
+ "model.layers.33.post_attention_layernorm.weight": "pytorch_model-00009-of-00010.bin",
256
+ "model.layers.33.self_attn.k_proj.weight": "pytorch_model-00008-of-00010.bin",
257
+ "model.layers.33.self_attn.o_proj.weight": "pytorch_model-00008-of-00010.bin",
258
+ "model.layers.33.self_attn.q_proj.weight": "pytorch_model-00008-of-00010.bin",
259
+ "model.layers.33.self_attn.v_proj.weight": "pytorch_model-00008-of-00010.bin",
260
+ "model.layers.34.input_layernorm.weight": "pytorch_model-00009-of-00010.bin",
261
+ "model.layers.34.mlp.down_proj.weight": "pytorch_model-00009-of-00010.bin",
262
+ "model.layers.34.mlp.gate_proj.weight": "pytorch_model-00009-of-00010.bin",
263
+ "model.layers.34.mlp.up_proj.weight": "pytorch_model-00009-of-00010.bin",
264
+ "model.layers.34.post_attention_layernorm.weight": "pytorch_model-00009-of-00010.bin",
265
+ "model.layers.34.self_attn.k_proj.weight": "pytorch_model-00009-of-00010.bin",
266
+ "model.layers.34.self_attn.o_proj.weight": "pytorch_model-00009-of-00010.bin",
267
+ "model.layers.34.self_attn.q_proj.weight": "pytorch_model-00009-of-00010.bin",
268
+ "model.layers.34.self_attn.v_proj.weight": "pytorch_model-00009-of-00010.bin",
269
+ "model.layers.35.input_layernorm.weight": "pytorch_model-00009-of-00010.bin",
270
+ "model.layers.35.mlp.down_proj.weight": "pytorch_model-00009-of-00010.bin",
271
+ "model.layers.35.mlp.gate_proj.weight": "pytorch_model-00009-of-00010.bin",
272
+ "model.layers.35.mlp.up_proj.weight": "pytorch_model-00009-of-00010.bin",
273
+ "model.layers.35.post_attention_layernorm.weight": "pytorch_model-00009-of-00010.bin",
274
+ "model.layers.35.self_attn.k_proj.weight": "pytorch_model-00009-of-00010.bin",
275
+ "model.layers.35.self_attn.o_proj.weight": "pytorch_model-00009-of-00010.bin",
276
+ "model.layers.35.self_attn.q_proj.weight": "pytorch_model-00009-of-00010.bin",
277
+ "model.layers.35.self_attn.v_proj.weight": "pytorch_model-00009-of-00010.bin",
278
+ "model.layers.36.input_layernorm.weight": "pytorch_model-00009-of-00010.bin",
279
+ "model.layers.36.mlp.down_proj.weight": "pytorch_model-00009-of-00010.bin",
280
+ "model.layers.36.mlp.gate_proj.weight": "pytorch_model-00009-of-00010.bin",
281
+ "model.layers.36.mlp.up_proj.weight": "pytorch_model-00009-of-00010.bin",
282
+ "model.layers.36.post_attention_layernorm.weight": "pytorch_model-00009-of-00010.bin",
283
+ "model.layers.36.self_attn.k_proj.weight": "pytorch_model-00009-of-00010.bin",
284
+ "model.layers.36.self_attn.o_proj.weight": "pytorch_model-00009-of-00010.bin",
285
+ "model.layers.36.self_attn.q_proj.weight": "pytorch_model-00009-of-00010.bin",
286
+ "model.layers.36.self_attn.v_proj.weight": "pytorch_model-00009-of-00010.bin",
287
+ "model.layers.37.input_layernorm.weight": "pytorch_model-00009-of-00010.bin",
288
+ "model.layers.37.mlp.down_proj.weight": "pytorch_model-00009-of-00010.bin",
289
+ "model.layers.37.mlp.gate_proj.weight": "pytorch_model-00009-of-00010.bin",
290
+ "model.layers.37.mlp.up_proj.weight": "pytorch_model-00009-of-00010.bin",
291
+ "model.layers.37.post_attention_layernorm.weight": "pytorch_model-00009-of-00010.bin",
292
+ "model.layers.37.self_attn.k_proj.weight": "pytorch_model-00009-of-00010.bin",
293
+ "model.layers.37.self_attn.o_proj.weight": "pytorch_model-00009-of-00010.bin",
294
+ "model.layers.37.self_attn.q_proj.weight": "pytorch_model-00009-of-00010.bin",
295
+ "model.layers.37.self_attn.v_proj.weight": "pytorch_model-00009-of-00010.bin",
296
+ "model.layers.38.input_layernorm.weight": "pytorch_model-00010-of-00010.bin",
297
+ "model.layers.38.mlp.down_proj.weight": "pytorch_model-00010-of-00010.bin",
298
+ "model.layers.38.mlp.gate_proj.weight": "pytorch_model-00010-of-00010.bin",
299
+ "model.layers.38.mlp.up_proj.weight": "pytorch_model-00010-of-00010.bin",
300
+ "model.layers.38.post_attention_layernorm.weight": "pytorch_model-00010-of-00010.bin",
301
+ "model.layers.38.self_attn.k_proj.weight": "pytorch_model-00010-of-00010.bin",
302
+ "model.layers.38.self_attn.o_proj.weight": "pytorch_model-00010-of-00010.bin",
303
+ "model.layers.38.self_attn.q_proj.weight": "pytorch_model-00010-of-00010.bin",
304
+ "model.layers.38.self_attn.v_proj.weight": "pytorch_model-00010-of-00010.bin",
305
+ "model.layers.39.input_layernorm.weight": "pytorch_model-00010-of-00010.bin",
306
+ "model.layers.39.mlp.down_proj.weight": "pytorch_model-00010-of-00010.bin",
307
+ "model.layers.39.mlp.gate_proj.weight": "pytorch_model-00010-of-00010.bin",
308
+ "model.layers.39.mlp.up_proj.weight": "pytorch_model-00010-of-00010.bin",
309
+ "model.layers.39.post_attention_layernorm.weight": "pytorch_model-00010-of-00010.bin",
310
+ "model.layers.39.self_attn.k_proj.weight": "pytorch_model-00010-of-00010.bin",
311
+ "model.layers.39.self_attn.o_proj.weight": "pytorch_model-00010-of-00010.bin",
312
+ "model.layers.39.self_attn.q_proj.weight": "pytorch_model-00010-of-00010.bin",
313
+ "model.layers.39.self_attn.v_proj.weight": "pytorch_model-00010-of-00010.bin",
314
+ "model.layers.4.input_layernorm.weight": "pytorch_model-00002-of-00010.bin",
315
+ "model.layers.4.mlp.down_proj.weight": "pytorch_model-00002-of-00010.bin",
316
+ "model.layers.4.mlp.gate_proj.weight": "pytorch_model-00002-of-00010.bin",
317
+ "model.layers.4.mlp.up_proj.weight": "pytorch_model-00002-of-00010.bin",
318
+ "model.layers.4.post_attention_layernorm.weight": "pytorch_model-00002-of-00010.bin",
319
+ "model.layers.4.self_attn.k_proj.weight": "pytorch_model-00002-of-00010.bin",
320
+ "model.layers.4.self_attn.o_proj.weight": "pytorch_model-00002-of-00010.bin",
321
+ "model.layers.4.self_attn.q_proj.weight": "pytorch_model-00002-of-00010.bin",
322
+ "model.layers.4.self_attn.v_proj.weight": "pytorch_model-00002-of-00010.bin",
323
+ "model.layers.5.input_layernorm.weight": "pytorch_model-00002-of-00010.bin",
324
+ "model.layers.5.mlp.down_proj.weight": "pytorch_model-00002-of-00010.bin",
325
+ "model.layers.5.mlp.gate_proj.weight": "pytorch_model-00002-of-00010.bin",
326
+ "model.layers.5.mlp.up_proj.weight": "pytorch_model-00002-of-00010.bin",
327
+ "model.layers.5.post_attention_layernorm.weight": "pytorch_model-00002-of-00010.bin",
328
+ "model.layers.5.self_attn.k_proj.weight": "pytorch_model-00002-of-00010.bin",
329
+ "model.layers.5.self_attn.o_proj.weight": "pytorch_model-00002-of-00010.bin",
330
+ "model.layers.5.self_attn.q_proj.weight": "pytorch_model-00002-of-00010.bin",
331
+ "model.layers.5.self_attn.v_proj.weight": "pytorch_model-00002-of-00010.bin",
332
+ "model.layers.6.input_layernorm.weight": "pytorch_model-00003-of-00010.bin",
333
+ "model.layers.6.mlp.down_proj.weight": "pytorch_model-00003-of-00010.bin",
334
+ "model.layers.6.mlp.gate_proj.weight": "pytorch_model-00002-of-00010.bin",
335
+ "model.layers.6.mlp.up_proj.weight": "pytorch_model-00003-of-00010.bin",
336
+ "model.layers.6.post_attention_layernorm.weight": "pytorch_model-00003-of-00010.bin",
337
+ "model.layers.6.self_attn.k_proj.weight": "pytorch_model-00002-of-00010.bin",
338
+ "model.layers.6.self_attn.o_proj.weight": "pytorch_model-00002-of-00010.bin",
339
+ "model.layers.6.self_attn.q_proj.weight": "pytorch_model-00002-of-00010.bin",
340
+ "model.layers.6.self_attn.v_proj.weight": "pytorch_model-00002-of-00010.bin",
341
+ "model.layers.7.input_layernorm.weight": "pytorch_model-00003-of-00010.bin",
342
+ "model.layers.7.mlp.down_proj.weight": "pytorch_model-00003-of-00010.bin",
343
+ "model.layers.7.mlp.gate_proj.weight": "pytorch_model-00003-of-00010.bin",
344
+ "model.layers.7.mlp.up_proj.weight": "pytorch_model-00003-of-00010.bin",
345
+ "model.layers.7.post_attention_layernorm.weight": "pytorch_model-00003-of-00010.bin",
346
+ "model.layers.7.self_attn.k_proj.weight": "pytorch_model-00003-of-00010.bin",
347
+ "model.layers.7.self_attn.o_proj.weight": "pytorch_model-00003-of-00010.bin",
348
+ "model.layers.7.self_attn.q_proj.weight": "pytorch_model-00003-of-00010.bin",
349
+ "model.layers.7.self_attn.v_proj.weight": "pytorch_model-00003-of-00010.bin",
350
+ "model.layers.8.input_layernorm.weight": "pytorch_model-00003-of-00010.bin",
351
+ "model.layers.8.mlp.down_proj.weight": "pytorch_model-00003-of-00010.bin",
352
+ "model.layers.8.mlp.gate_proj.weight": "pytorch_model-00003-of-00010.bin",
353
+ "model.layers.8.mlp.up_proj.weight": "pytorch_model-00003-of-00010.bin",
354
+ "model.layers.8.post_attention_layernorm.weight": "pytorch_model-00003-of-00010.bin",
355
+ "model.layers.8.self_attn.k_proj.weight": "pytorch_model-00003-of-00010.bin",
356
+ "model.layers.8.self_attn.o_proj.weight": "pytorch_model-00003-of-00010.bin",
357
+ "model.layers.8.self_attn.q_proj.weight": "pytorch_model-00003-of-00010.bin",
358
+ "model.layers.8.self_attn.v_proj.weight": "pytorch_model-00003-of-00010.bin",
359
+ "model.layers.9.input_layernorm.weight": "pytorch_model-00003-of-00010.bin",
360
+ "model.layers.9.mlp.down_proj.weight": "pytorch_model-00003-of-00010.bin",
361
+ "model.layers.9.mlp.gate_proj.weight": "pytorch_model-00003-of-00010.bin",
362
+ "model.layers.9.mlp.up_proj.weight": "pytorch_model-00003-of-00010.bin",
363
+ "model.layers.9.post_attention_layernorm.weight": "pytorch_model-00003-of-00010.bin",
364
+ "model.layers.9.self_attn.k_proj.weight": "pytorch_model-00003-of-00010.bin",
365
+ "model.layers.9.self_attn.o_proj.weight": "pytorch_model-00003-of-00010.bin",
366
+ "model.layers.9.self_attn.q_proj.weight": "pytorch_model-00003-of-00010.bin",
367
+ "model.layers.9.self_attn.v_proj.weight": "pytorch_model-00003-of-00010.bin",
368
+ "model.norm.weight": "pytorch_model-00010-of-00010.bin"
369
+ }
370
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "<pad>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<unk>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b0240ce510f08e6c2041724e9043e33be9d251d1e4a4d94eb68cd47b954b61d2
3
+ size 17078292
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff