jncraton commited on
Commit
bf74a50
1 Parent(s): 72c1830
README.md ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ widget:
7
+ - text: >-
8
+ Below is an instruction that describes a task.
9
+
10
+ Write a response that appropriately completes the request.
11
+
12
+
13
+
14
+ ### Instruction:
15
+
16
+ how can I become more healthy?
17
+
18
+
19
+ ### Response:
20
+ example_title: example
21
+ ---
22
+
23
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
24
+ should probably proofread and complete it, then remove this comment. -->
25
+
26
+ <p align="center" width="100%">
27
+ <a><img src="https://raw.githubusercontent.com/mbzuai-nlp/lamini-lm/main/images/lamini.png" alt="Title" style="width: 100%; min-width: 300px; display: block; margin: auto;"></a>
28
+ </p>
29
+
30
+ # LaMini-GPT-124M
31
+
32
+ [![Model License](https://img.shields.io/badge/Model%20License-CC%20By%20NC%204.0-red.svg)]()
33
+
34
+ This model is one of our LaMini-LM model series in paper "[LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions](https://github.com/mbzuai-nlp/lamini-lm)".
35
+ This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on [LaMini-instruction dataset](https://huggingface.co/datasets/MBZUAI/LaMini-instruction) that contains 2.58M samples for instruction fine-tuning. For more information about our dataset, please refer to our [project repository](https://github.com/mbzuai-nlp/lamini-lm/).
36
+ You can view other models of LaMini-LM series as follows. Models with ✩ are those with the best overall performance given their size/architecture, hence we recommend using them. More details can be seen in our paper.
37
+
38
+ <table>
39
+ <thead>
40
+ <tr>
41
+ <th>Base model</th>
42
+ <th colspan="4">LaMini-LM series (#parameters)</th>
43
+ </tr>
44
+ </thead>
45
+ <tbody>
46
+ <tr>
47
+ <td>T5</td>
48
+ <td><a href="https://huggingface.co/MBZUAI/lamini-t5-61m" target="_blank" rel="noopener noreferrer">LaMini-T5-61M</a></td>
49
+ <td><a href="https://huggingface.co/MBZUAI/lamini-t5-223m" target="_blank" rel="noopener noreferrer">LaMini-T5-223M</a></td>
50
+ <td><a href="https://huggingface.co/MBZUAI/lamini-t5-738m" target="_blank" rel="noopener noreferrer">LaMini-T5-738M</a></td>
51
+ <td></td>
52
+ </tr>
53
+ <tr>
54
+ <td>Flan-T5</td>
55
+ <td><a href="https://huggingface.co/MBZUAI/lamini-flan-t5-77m" target="_blank" rel="noopener noreferrer">LaMini-Flan-T5-77M</a>✩</td>
56
+ <td><a href="https://huggingface.co/MBZUAI/lamini-flan-t5-248m" target="_blank" rel="noopener noreferrer">LaMini-Flan-T5-248M</a>✩</td>
57
+ <td><a href="https://huggingface.co/MBZUAI/lamini-flan-t5-783m" target="_blank" rel="noopener noreferrer">LaMini-Flan-T5-783M</a>✩</td>
58
+ <td></td>
59
+ </tr>
60
+ <tr>
61
+ <td>Cerebras-GPT</td>
62
+ <td><a href="https://huggingface.co/MBZUAI/lamini-cerebras-111m" target="_blank" rel="noopener noreferrer">LaMini-Cerebras-111M</a></td>
63
+ <td><a href="https://huggingface.co/MBZUAI/lamini-cerebras-256m" target="_blank" rel="noopener noreferrer">LaMini-Cerebras-256M</a></td>
64
+ <td><a href="https://huggingface.co/MBZUAI/lamini-cerebras-590m" target="_blank" rel="noopener noreferrer">LaMini-Cerebras-590M</a></td>
65
+ <td><a href="https://huggingface.co/MBZUAI/lamini-cerebras-1.3b" target="_blank" rel="noopener noreferrer">LaMini-Cerebras-1.3B</a></td>
66
+ </tr>
67
+ <tr>
68
+ <td>GPT-2</td>
69
+ <td><a href="https://huggingface.co/MBZUAI/lamini-gpt-124m" target="_blank" rel="noopener noreferrer">LaMini-GPT-124M</a>✩</td>
70
+ <td><a href="https://huggingface.co/MBZUAI/lamini-gpt-774m" target="_blank" rel="noopener noreferrer">LaMini-GPT-774M</a>✩</td>
71
+ <td><a href="https://huggingface.co/MBZUAI/lamini-gpt-1.5b" target="_blank" rel="noopener noreferrer">LaMini-GPT-1.5B</a>✩</td>
72
+ <td></td>
73
+ </tr>
74
+ <tr>
75
+ <td>GPT-Neo</td>
76
+ <td><a href="https://huggingface.co/MBZUAI/lamini-neo-125m" target="_blank" rel="noopener noreferrer">LaMini-Neo-125M</a></td>
77
+ <td><a href="https://huggingface.co/MBZUAI/lamini-neo-1.3b" target="_blank" rel="noopener noreferrer">LaMini-Neo-1.3B</a></td>
78
+ <td></td>
79
+ <td></td>
80
+ </tr>
81
+ <tr>
82
+ <td>GPT-J</td>
83
+ <td colspan="4">coming soon</td>
84
+ </tr>
85
+ <tr>
86
+ <td>LLaMA</td>
87
+ <td colspan="4">coming soon</td>
88
+ </tr>
89
+
90
+
91
+ </tbody>
92
+ </table>
93
+
94
+
95
+ ## Use
96
+
97
+ ### Intended use
98
+ We recommend using the model to respond to human instructions written in natural language.
99
+ Since this decoder-only model is fine-tuned with wrapper text, we suggest using the same wrapper text to achieve the best performance.
100
+ See the example on the right or the code below.
101
+
102
+ We now show you how to load and use our model using HuggingFace `pipeline()`.
103
+
104
+ ```python
105
+ # pip install -q transformers
106
+ from transformers import pipeline
107
+
108
+ checkpoint = "{model_name}"
109
+
110
+ model = pipeline('text-generation', model = checkpoint)
111
+
112
+ instruction = 'Please let me know your thoughts on the given place and why you think it deserves to be visited: \n"Barcelona, Spain"'
113
+
114
+ input_prompt = f"Below is an instruction that describes a task. Write a response that appropriately completes the request.\n\n### Instruction:\n{instruction}\n\n### Response:"
115
+
116
+ generated_text = model(input_prompt, max_length=512, do_sample=True)[0]['generated_text']
117
+
118
+ print("Response", generated_text)
119
+ ```
120
+
121
+ ## Training Procedure
122
+
123
+ <p align="center" width="100%">
124
+ <a><img src="https://raw.githubusercontent.com/mbzuai-nlp/lamini-lm/main/images/lamini-pipeline.drawio.png" alt="Title" style="width: 100%; min-width: 250px; display: block; margin: auto;"></a>
125
+ </p>
126
+
127
+ We initialize with [gpt2](https://huggingface.co/gpt2) and fine-tune it on our [LaMini-instruction dataset](https://huggingface.co/datasets/MBZUAI/LaMini-instruction). Its total number of parameters is 124M.
128
+
129
+ ### Training Hyperparameters
130
+
131
+
132
+
133
+ ## Evaluation
134
+ We conducted two sets of evaluations: automatic evaluation on downstream NLP tasks and human evaluation on user-oriented instructions. For more detail, please refer to our [paper]().
135
+
136
+ ## Limitations
137
+
138
+ More information needed
139
+
140
+
141
+ # Citation
142
+
143
+ ```bibtex
144
+ @article{lamini-lm,
145
+ author = {Minghao Wu and
146
+ Abdul Waheed and
147
+ Chiyu Zhang and
148
+ Muhammad Abdul-Mageed and
149
+ Alham Fikri Aji
150
+ },
151
+ title = {LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions},
152
+ journal = {CoRR},
153
+ volume = {abs/2304.14402},
154
+ year = {2023},
155
+ url = {https://arxiv.org/abs/2304.14402},
156
+ eprinttype = {arXiv},
157
+ eprint = {2304.14402}
158
+ }
159
+
160
+ ```
added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "[PAD]": 50257
3
+ }
config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<|endoftext|>",
3
+ "eos_token": "<|endoftext|>",
4
+ "layer_norm_epsilon": null,
5
+ "unk_token": "<|endoftext|>"
6
+ }
generation_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 50256,
4
+ "eos_token_id": 50256,
5
+ "transformers_version": "4.27.4"
6
+ }
model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c3f3cf3fa6c344ee07abdb1187d90582a7198750f15d5343f80f6c560e5ecfe1
3
+ size 127708029
special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|endoftext|>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|endoftext|>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "[PAD]",
17
+ "unk_token": {
18
+ "content": "<|endoftext|>",
19
+ "lstrip": false,
20
+ "normalized": true,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "bos_token": {
5
+ "__type": "AddedToken",
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": true,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "eos_token": {
13
+ "__type": "AddedToken",
14
+ "content": "<|endoftext|>",
15
+ "lstrip": false,
16
+ "normalized": true,
17
+ "rstrip": false,
18
+ "single_word": false
19
+ },
20
+ "errors": "replace",
21
+ "model_max_length": 512,
22
+ "pad_token": null,
23
+ "padding_side": "right",
24
+ "special_tokens_map_file": null,
25
+ "tokenizer_class": "GPT2Tokenizer",
26
+ "unk_token": {
27
+ "__type": "AddedToken",
28
+ "content": "<|endoftext|>",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false
33
+ }
34
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
vocabulary.json ADDED
The diff for this file is too large to render. See raw diff