michaelfeil commited on
Commit
20ec98d
1 Parent(s): 2901a98

Upload OpenAssistant/stablelm-7b-sft-v7-epoch-3 ctranslate fp16 weights

Browse files
Files changed (4) hide show
  1. README.md +196 -0
  2. generation_config.json +6 -0
  3. model.bin +2 -2
  4. special_tokens_map.json +14 -0
README.md ADDED
@@ -0,0 +1,196 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - ctranslate2
6
+ - int8
7
+ - float16
8
+ - sft
9
+ pipeline_tag: text-generation
10
+ widget:
11
+ - text: >-
12
+ <|prompter|>What is a meme, and what's the history behind this
13
+ word?<|endoftext|><|assistant|>
14
+ - text: <|prompter|>What's the Earth total population<|endoftext|><|assistant|>
15
+ - text: >-
16
+ <|prompter|>Write a story about future of AI
17
+ development<|endoftext|><|assistant|>
18
+ ---
19
+ # # Fast-Inference with Ctranslate2
20
+ Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on CPU or GPU.
21
+
22
+ quantized version of [OpenAssistant/stablelm-7b-sft-v7-epoch-3](https://huggingface.co/OpenAssistant/stablelm-7b-sft-v7-epoch-3)
23
+ ```bash
24
+ pip install hf-hub-ctranslate2>=2.0.8
25
+ ```
26
+ Converted on 2023-05-22 using
27
+ ```
28
+ ct2-transformers-converter --model OpenAssistant/stablelm-7b-sft-v7-epoch-3 --output_dir /home/michael/tmp-ct2fast-stablelm-7b-sft-v7-epoch-3 --force --copy_files tokenizer.json README.md tokenizer_config.json generation_config.json special_tokens_map.json .gitattributes --quantization float16
29
+ ```
30
+
31
+ Checkpoint compatible to [ctranslate2>=3.13.0](https://github.com/OpenNMT/CTranslate2) and [hf-hub-ctranslate2>=2.0.6](https://github.com/michaelfeil/hf-hub-ctranslate2)
32
+ - `compute_type=int8_float16` for `device="cuda"`
33
+ - `compute_type=int8` for `device="cpu"`
34
+
35
+ ```python
36
+ from hf_hub_ctranslate2 import TranslatorCT2fromHfHub, GeneratorCT2fromHfHub
37
+ from transformers import AutoTokenizer
38
+
39
+ model_name = "michaelfeil/ct2fast-stablelm-7b-sft-v7-epoch-3"
40
+ # use either TranslatorCT2fromHfHub or GeneratorCT2fromHfHub here, depending on model.
41
+ model = GeneratorCT2fromHfHub(
42
+ # load in int8 on CUDA
43
+ model_name_or_path=model_name,
44
+ device="cuda",
45
+ compute_type="int8_float16",
46
+ # tokenizer=AutoTokenizer.from_pretrained("OpenAssistant/stablelm-7b-sft-v7-epoch-3")
47
+ )
48
+ outputs = model.generate(
49
+ text=["def print_hello_world():", "def hello_name(name:"],
50
+ max_length=64
51
+ )
52
+ print(outputs)
53
+ ```
54
+
55
+ # Licence and other remarks:
56
+ This is just a quantized version. Licence conditions are intended to be idential to original huggingface repo.
57
+
58
+ # Original description
59
+
60
+
61
+ # Open-Assistant StableLM-7B SFT-7 Model
62
+
63
+
64
+ This is the 7th iteration English supervised-fine-tuning (SFT) model of
65
+ the [Open-Assistant](https://github.com/LAION-AI/Open-Assistant) project.
66
+ It is based on a StableLM 7B that was fine-tuned on human demonstrations
67
+ of assistant conversations collected through the
68
+ [https://open-assistant.io/](https://open-assistant.io/) human feedback web
69
+ app before April 12, 2023.
70
+
71
+ ## Model Details
72
+
73
+ - **Developed by:** [Open-Assistant Contributors](https://open-assistant.io/)
74
+ - **Model type:** Transformer-based Language Model
75
+ - **Language:** English
76
+ - **Finetuned from:** [stabilityai/stablelm-base-alpha-7b](https://huggingface.co/stabilityai/stablelm-base-alpha-7b)
77
+ - **Code:** [Open-Assistant/model/model_training](https://github.com/LAION-AI/Open-Assistant/tree/main/model/model_training)
78
+ - **Demo:** TODO
79
+ - **License:** Creative Commons license ([CC BY-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/))
80
+ - **Contact:** [Open-Assistant Discord](https://ykilcher.com/open-assistant-discord)
81
+
82
+ ## Prompting
83
+
84
+ Two special tokens are used to mark the beginning of user and assistant turns:
85
+ `<|prompter|>` and `<|assistant|>`. Each turn ends with a `<|endoftext|>` token.
86
+
87
+ Input prompt example:
88
+ ```
89
+ <|prompter|>What is a meme, and what's the history behind this word?<|endoftext|><|assistant|>
90
+ ```
91
+ The input ends with the `<|assistant|>` token to signal that the model should
92
+ start generating the assistant reply.
93
+
94
+
95
+ ## Dev Details
96
+
97
+ - wandb: https://wandb.ai/open-assistant/supervised-finetuning/runs/08dfhyuc
98
+ - base model: [stabilityai/stablelm-base-alpha-7b](https://huggingface.co/stabilityai/stablelm-base-alpha-7b)
99
+ - checkpoint: 3 epochs (12000 steps)
100
+
101
+ command: `deepspeed trainer_sft.py --configs defaults stablelm-7b oasst-mix --cache_dir /home/ubuntu/data_cache --output_dir .saved/stable-lm-7b-1 --num_train_epochs 4 --deepspeed`
102
+
103
+ data:
104
+ ```
105
+ oasst-mix:
106
+ save_strategy: epoch
107
+ sort_by_length: false
108
+ use_custom_sampler: false
109
+ datasets:
110
+ - oasst_export:
111
+ lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk"
112
+ input_file_path: 2023-04-12_oasst_release_ready_synth.jsonl.gz
113
+ - vicuna:
114
+ val_split: 0.05
115
+ max_val_set: 800
116
+ fraction: 1.0
117
+ - dolly15k:
118
+ val_split: 0.05
119
+ max_val_set: 300
120
+ - grade_school_math_instructions:
121
+ val_split: 0.05
122
+ - code_alpaca:
123
+ val_split: 0.05
124
+ max_val_set: 250
125
+ ```
126
+
127
+
128
+ stablelm:
129
+ ```
130
+ stablelm-7b:
131
+ dtype: fp16
132
+ log_dir: stablelm_log_7b
133
+ model_name: stabilityai/stablelm-base-alpha-7b
134
+ output_dir: stablelm_7b
135
+ max_length: 4096
136
+ warmup_steps: 100
137
+ gradient_checkpointing: true
138
+ gradient_accumulation_steps: 2
139
+ per_device_train_batch_size: 4
140
+ per_device_eval_batch_size: 4
141
+ eval_steps: 100
142
+ save_steps: 500
143
+ num_train_epochs: 4
144
+ save_total_limit: 4
145
+ use_flash_attention: true
146
+ ```
147
+
148
+ zero config:
149
+ ```
150
+ {
151
+ "fp16": {
152
+ "enabled": "auto",
153
+ "loss_scale": 0,
154
+ "loss_scale_window": 1000,
155
+ "initial_scale_power": 16,
156
+ "hysteresis": 2,
157
+ "min_loss_scale": 1
158
+ },
159
+ "bf16": {
160
+ "enabled": "auto"
161
+ },
162
+ "optimizer": {
163
+ "type": "AdamW",
164
+ "params": {
165
+ "lr": "auto",
166
+ "betas": "auto",
167
+ "eps": "auto",
168
+ "weight_decay": "auto"
169
+ }
170
+ },
171
+ "scheduler": {
172
+ "type": "WarmupDecayLR",
173
+ "params": {
174
+ "warmup_min_lr": "auto",
175
+ "warmup_max_lr": "auto",
176
+ "warmup_num_steps": "auto",
177
+ "total_num_steps": "auto"
178
+ }
179
+ },
180
+ "zero_optimization": {
181
+ "stage": 2,
182
+ "allgather_partitions": true,
183
+ "allgather_bucket_size": 1e9,
184
+ "overlap_comm": false,
185
+ "reduce_scatter": true,
186
+ "reduce_bucket_size": 1e9,
187
+ "contiguous_gradients": true
188
+ },
189
+ "gradient_accumulation_steps": "auto",
190
+ "gradient_clipping": "auto",
191
+ "steps_per_print": 2000,
192
+ "train_batch_size": "auto",
193
+ "train_micro_batch_size_per_gpu": "auto",
194
+ "wall_clock_breakdown": false
195
+ }
196
+ ```
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "eos_token_id": 0,
5
+ "transformers_version": "4.28.0.dev0"
6
+ }
model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d471f26343e54973bcafe1c34afcb07c9691c4f2d734ca7432c74b67a25f5911
3
- size 7872099846
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:26f928a83c8c64129b8c886e2f9dd86b86e0f7583c2cadcb7583bc0cbe3a5058
3
+ size 15733850934
special_tokens_map.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|prefix_end|>",
4
+ "<|prefix_begin|>",
5
+ "<|assistant|>",
6
+ "<|prompter|>",
7
+ "<|system|>"
8
+ ],
9
+ "bos_token": "<|endoftext|>",
10
+ "eos_token": "<|endoftext|>",
11
+ "pad_token": "<|padding|>",
12
+ "sep_token": "<|endoftext|>",
13
+ "unk_token": "<|endoftext|>"
14
+ }