Text Generation
Transformers
Safetensors
llama
text-generation-inference
Inference Endpoints
mfromm commited on
Commit
595fe5f
·
verified ·
1 Parent(s): ee1e3f9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +201 -3
README.md CHANGED
@@ -1,3 +1,201 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - de
4
+ - bg
5
+ - cs
6
+ - da
7
+ - el
8
+ - en
9
+ - es
10
+ - et
11
+ - fi
12
+ - fr
13
+ - ga
14
+ - hr
15
+ - hu
16
+ - it
17
+ - lt
18
+ - lv
19
+ - mt
20
+ - nl
21
+ - pl
22
+ - pt
23
+ - ro
24
+ - sl
25
+ - sv
26
+ - sk
27
+ metrics:
28
+ - accuracy
29
+ - bleu
30
+ pipeline_tag: text-generation
31
+ library_name: transformers
32
+ base_model:
33
+ - openGPT-X/Teuken-7B-base-v0.4
34
+ license: apache-2.0
35
+ ---
36
+ # Model Card for HalloEurope-7B-Instruct
37
+
38
+ Teuken-7B-chat-v0.4 is an instruction-tuned version of Teuken-7B-base-v0.4.
39
+
40
+
41
+ ### Model Description
42
+
43
+ <!-- Provide a longer summary of what this model is. -->
44
+
45
+ - **Developed by:** Fraunhofer IAIS
46
+ - **Funded by:** German Federal Ministry of Economics and Climate Protection (BMWK) in the context of the OpenGPT-X project
47
+ - **Model type:** Transformer based decoder-only model
48
+ - **Language(s) (NLP):** bg, cs, da, de, el, en, es, et, fi, fr, ga, hr, hu, it, lt, lv, mt, nl, pl, pt, ro, sk, sl, sv
49
+ - **Shared by:** Fraunhofer IAIS
50
+
51
+ ## Uses
52
+
53
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
54
+ Teuken-7B-chat-v0.4 is intended for commercial and research use in all official 24 European languages. Since Teuken-7B-chat-v0.4 focuses on covering all 24 EU languages, it renders more stable results across these languages and better reflects European values in its answers than English-centric models. It is therefore specialized for use in multilingual tasks.
55
+
56
+ ### Out-of-Scope Use
57
+
58
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
59
+
60
+ The model is not intended for use in math and coding tasks.
61
+
62
+ ## Bias, Risks, and Limitations
63
+
64
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
65
+
66
+ Teuken-7B-chat-v0.4 is an instruction-tuned version of Teuken-7B-base-v0.4 that is not completely free from biases and hallucinations.
67
+
68
+ ## How to Get Started with the Model
69
+
70
+ ## Usage
71
+ The model requires transformers, sentencepiece, and the torch library.
72
+ After installation, here's an example of how to use the model:
73
+
74
+ The prompt template for the fine-tuned model is defined as follows:
75
+ ```python
76
+ user="Hi!"
77
+ lang_code = "DE"
78
+ system_messages={
79
+ "EN": "A chat between a human and an artificial intelligence assistant."
80
+ " The assistant gives helpful and polite answers to the human's questions.",
81
+ "DE": "Ein Gespräch zwischen einem Menschen und einem Assistenten mit künstlicher Intelligenz."
82
+ " Der Assistent gibt hilfreiche und höfliche Antworten auf die Fragen des Menschen.",
83
+ }
84
+
85
+ prompt = f"System: {system_messages[lang_code]}\nUser: {user}\nAssistant:<s>"
86
+ ```
87
+
88
+ ```python
89
+ import torch
90
+ from transformers import AutoModelForCausalLM, AutoTokenizer
91
+
92
+ model_name = "openGPT-X/Teuken-7B-chat-v0.4"
93
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
94
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
95
+ model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, torch_dtype=torch.bfloat16).to(device)
96
+ inputs = tokenizer(prompt, return_tensors="pt")
97
+ inputs = {k: v.to(device) for k, v in inputs.items()} # Move inputs to the same device as the model
98
+ output = model.generate(input_ids=inputs['input_ids'], max_new_tokens=1000, do_sample=True)
99
+ result = tokenizer.decode(output.tolist())
100
+ ```
101
+
102
+ This example demonstrates how to load the model and tokenizer, prepare input, generate text, and print the result.
103
+
104
+ ## Training Details
105
+
106
+ ### Training Data
107
+
108
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
109
+
110
+ For composing the final instruction-tuning dataset termed "Honey", we first include all German examples. We aim to include roughly the same amount of English examples, as we have German examples:
111
+ 1. Add all multi-turn examples
112
+ 2. Add the entire code_alpaca dataset subset
113
+ 3. Add entire lmsys_chat_1m_high_quality_train_en dataset subset
114
+ 4. For the remaining dataset subsets ("open_orca", "evol_instruct_143k", "evol_instruct_70k", "bactrianx_EN") add the examples with the highest reward scores ("quality score") so that each dataset subset contributes an equal amount of high-quality examples
115
+
116
+ ## Dataset Sizes Before Composition
117
+
118
+ ### English
119
+
120
+
121
+
122
+ ### German
123
+
124
+
125
+
126
+ ### Training Procedure
127
+
128
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
129
+ Instruction fined tuned version of Teuken-7B-base-v0.4.
130
+
131
+
132
+ #### Training Hyperparameters
133
+
134
+ - **Training regime:** bf16 mixed precision <!--fp32, fp16 mixed precision, , bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
135
+
136
+ ## Evaluation
137
+
138
+ <!-- This section describes the evaluation protocols and provides the results. -->
139
+
140
+ ### Testing Data, Factors & Metrics
141
+
142
+ #### Testing Data
143
+
144
+ <!-- This should link to a Dataset Card if possible. -->
145
+
146
+ The model was evaluated in 21 languages on ARC, GSM8K, HellaSwag, TruthfulQA, Translation and MMLU. Results can be seen in the European LLM Leaderboard (https://huggingface.co/spaces/openGPT-X/european-llm-leaderboard).
147
+
148
+ ## Technical Specifications
149
+
150
+ ### Model Architecture and Objective
151
+
152
+ | Hyper-Parameter | Value |
153
+ |----------------------------|----------|
154
+ | Training Objective | CLM |
155
+ | Activation Function | SwiGLU |
156
+ | Seq Length | 4096 |
157
+ | Position Embeddings | Rotary |
158
+ | Num Layers | 32 |
159
+ | Hidden Size | 4096 |
160
+ | FFN Hidden Size | 13440 |
161
+ | Num Attention Heads | 32 |
162
+ | Head Dim | 128 |
163
+ | Group Query Attention | yes |
164
+ | Num Query Groups | 2 |
165
+ | Normalization | RMSNorm |
166
+ | Learning rate | 3e-4 |
167
+ | Min learning rate | 3e-5 |
168
+ | Disable bias in linear | yes |
169
+ | Hidden dropout | 0.0 |
170
+ | Attention dropout | 0.0 |
171
+ | Optimizer | AdamW |
172
+ | Beta1 | 0.9 |
173
+ | Beta2 | 0.95 |
174
+ | Sequence-parallelism
175
+ | Data-type | bf16 |
176
+ | Recompute-activations | yes |
177
+ | Distributed-optimizers | yes |
178
+ | Model Initialization | |
179
+
180
+
181
+
182
+ **BibTeX:**
183
+
184
+ TODO
185
+
186
+ **APA:**
187
+
188
+ TODO
189
+
190
+ ## Model Card Contact
191
+
192
+ <div class="hf-card">
193
+ <h2>Contact Information</h2>
194
+ <p>You can reach out to the following model card contact:</p>
195
+ <ul>
196
+ <li>
197
+ <a href="https://huggingface.co/iwendler" target="_blank">OpenGPT-X</a>
198
199
+ </li>
200
+ </ul>
201
+ </div>