stelterlab commited on
Commit
38b6a29
·
verified ·
1 Parent(s): 653f9c8

Update README.md

Browse files

Added content of the original model card.

Files changed (1) hide show
  1. README.md +93 -0
README.md CHANGED
@@ -67,3 +67,96 @@ if hasattr(tokenizer, "apply_chat_template") and tokenizer.chat_template is not
67
 
68
  response = generate(model, tokenizer, prompt=prompt, verbose=True)
69
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
 
68
  response = generate(model, tokenizer, prompt=prompt, verbose=True)
69
  ```
70
+
71
+ Original Model Card follows:
72
+
73
+ # Model Card for EuroLLM-9B-Instruct
74
+
75
+ This is the model card for EuroLLM-9B-Instruct. You can also check the pre-trained version: [EuroLLM-9B](https://huggingface.co/utter-project/EuroLLM-9B).
76
+
77
+ - **Developed by:** Unbabel, Instituto Superior Técnico, Instituto de Telecomunicações, University of Edinburgh, Aveni, University of Paris-Saclay, University of Amsterdam, Naver Labs, Sorbonne Université.
78
+ - **Funded by:** European Union.
79
+ - **Model type:** A 9B parameter multilingual transfomer LLM.
80
+ - **Language(s) (NLP):** Bulgarian, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, German, Greek, Hungarian, Irish, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovenian, Spanish, Swedish, Arabic, Catalan, Chinese, Galician, Hindi, Japanese, Korean, Norwegian, Russian, Turkish, and Ukrainian.
81
+ - **License:** Apache License 2.0.
82
+
83
+ ## Model Details
84
+
85
+ The EuroLLM project has the goal of creating a suite of LLMs capable of understanding and generating text in all European Union languages as well as some additional relevant languages.
86
+ EuroLLM-9B is a 9B parameter model trained on 4 trillion tokens divided across the considered languages and several data sources: Web data, parallel data (en-xx and xx-en), and high-quality datasets.
87
+ EuroLLM-9B-Instruct was further instruction tuned on EuroBlocks, an instruction tuning dataset with focus on general instruction-following and machine translation.
88
+
89
+
90
+ ### Model Description
91
+
92
+ EuroLLM uses a standard, dense Transformer architecture:
93
+ - We use grouped query attention (GQA) with 8 key-value heads, since it has been shown to increase speed at inference time while maintaining downstream performance.
94
+ - We perform pre-layer normalization, since it improves the training stability, and use the RMSNorm, which is faster.
95
+ - We use the SwiGLU activation function, since it has been shown to lead to good results on downstream tasks.
96
+ - We use rotary positional embeddings (RoPE) in every layer, since these have been shown to lead to good performances while allowing the extension of the context length.
97
+
98
+ For pre-training, we use 400 Nvidia H100 GPUs of the Marenostrum 5 supercomputer, training the model with a constant batch size of 2,800 sequences, which corresponds to approximately 12 million tokens, using the Adam optimizer, and BF16 precision.
99
+ Here is a summary of the model hyper-parameters:
100
+ | | |
101
+ |--------------------------------------|----------------------|
102
+ | Sequence Length | 4,096 |
103
+ | Number of Layers | 42 |
104
+ | Embedding Size | 4,096 |
105
+ | FFN Hidden Size | 12,288 |
106
+ | Number of Heads | 32 |
107
+ | Number of KV Heads (GQA) | 8 |
108
+ | Activation Function | SwiGLU |
109
+ | Position Encodings | RoPE (\Theta=10,000) |
110
+ | Layer Norm | RMSNorm |
111
+ | Tied Embeddings | No |
112
+ | Embedding Parameters | 0.524B |
113
+ | LM Head Parameters | 0.524B |
114
+ | Non-embedding Parameters | 8.105B |
115
+ | Total Parameters | 9.154B |
116
+
117
+ ## Run the model
118
+
119
+ from transformers import AutoModelForCausalLM, AutoTokenizer
120
+
121
+ model_id = "utter-project/EuroLLM-9B-Instruct"
122
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
123
+ model = AutoModelForCausalLM.from_pretrained(model_id)
124
+
125
+ messages = [
126
+ {
127
+ "role": "system",
128
+ "content": "You are EuroLLM --- an AI assistant specialized in European languages that provides safe, educational and helpful answers.",
129
+ },
130
+ {
131
+ "role": "user", "content": "What is the capital of Portugal? How would you describe it?"
132
+ },
133
+ ]
134
+
135
+ inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
136
+ outputs = model.generate(inputs, max_new_tokens=1024)
137
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
138
+
139
+ ## Results
140
+
141
+ ### EU Languages
142
+
143
+
144
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63f33ecc0be81bdc5d903466/ob_1sLM8c7dxuwpv6AAHA.png)
145
+ **Table 1:** Comparison of open-weight LLMs on multilingual benchmarks. The borda count corresponds to the average ranking of the models (see ([Colombo et al., 2022](https://arxiv.org/abs/2202.03799))). For Arc-challenge, Hellaswag, and MMLU we are using Okapi datasets ([Lai et al., 2023](https://aclanthology.org/2023.emnlp-demo.28/)) which include 11 languages. For MMLU-Pro and MUSR we translate the English version with Tower ([Alves et al., 2024](https://arxiv.org/abs/2402.17733)) to 6 EU languages.
146
+ \* As there are no public versions of the pre-trained models, we evaluated them using the post-trained versions.
147
+
148
+ The results in Table 1 highlight EuroLLM-9B's superior performance on multilingual tasks compared to other European-developed models (as shown by the Borda count of 1.0), as well as its strong competitiveness with non-European models, achieving results comparable to Gemma-2-9B and outperforming the rest on most benchmarks.
149
+
150
+ ### English
151
+
152
+
153
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/63f33ecc0be81bdc5d903466/EfilsW_p-JA13mV2ilPkm.png)
154
+
155
+ **Table 2:** Comparison of open-weight LLMs on English general benchmarks.
156
+ \* As there are no public versions of the pre-trained models, we evaluated them using the post-trained versions.
157
+
158
+ The results in Table 2 demonstrate EuroLLM's strong performance on English tasks, surpassing most European-developed models and matching the performance of Mistral-7B (obtaining the same Borda count).
159
+
160
+ ## Bias, Risks, and Limitations
161
+
162
+ EuroLLM-9B has not been aligned to human preferences, so the model may generate problematic outputs (e.g., hallucinations, harmful content, or false statements).