Text Generation
Transformers
Safetensors
Basque
llama
text-generation-inference
Inference Endpoints
Ander Corral commited on
Commit
70fc5ca
1 Parent(s): c4945e2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +212 -138
README.md CHANGED
@@ -1,199 +1,273 @@
1
  ---
 
 
 
 
 
 
 
 
2
  library_name: transformers
3
- tags: []
4
  ---
5
 
6
- # Model Card for Model ID
7
 
8
- <!-- Provide a quick summary of what the model is/does. -->
9
 
 
10
 
 
11
 
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
 
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
 
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
 
70
- ## How to Get Started with the Model
71
 
72
- Use the code below to get started with the model.
 
 
 
73
 
74
- [More Information Needed]
75
 
76
  ## Training Details
77
 
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
 
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
 
88
- #### Preprocessing [optional]
89
 
90
- [More Information Needed]
91
 
 
92
 
93
- #### Training Hyperparameters
94
 
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
 
97
- #### Speeds, Sizes, Times [optional]
98
 
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
 
101
- [More Information Needed]
102
 
103
  ## Evaluation
104
 
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
 
109
- #### Testing Data
110
 
111
- <!-- This should link to a Dataset Card if possible. -->
112
 
113
- [More Information Needed]
114
 
115
- #### Factors
116
 
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
 
119
- [More Information Needed]
120
 
121
- #### Metrics
122
 
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
 
125
- [More Information Needed]
126
 
127
- ### Results
128
 
129
- [More Information Needed]
130
 
131
- #### Summary
132
 
 
133
 
134
 
135
- ## Model Examination [optional]
136
 
137
- <!-- Relevant interpretability work for the model goes here -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
138
 
139
- [More Information Needed]
140
 
141
  ## Environmental Impact
142
 
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
 
175
- **BibTeX:**
 
 
 
 
176
 
177
- [More Information Needed]
178
 
179
- **APA:**
180
 
181
- [More Information Needed]
182
 
183
- ## Glossary [optional]
184
 
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
 
187
- [More Information Needed]
188
 
189
- ## More Information [optional]
190
 
191
- [More Information Needed]
192
 
193
- ## Model Card Authors [optional]
194
 
195
- [More Information Needed]
 
 
 
 
 
 
 
196
 
197
- ## Model Card Contact
198
 
199
- [More Information Needed]
 
 
1
  ---
2
+ license: llama3.1
3
+ datasets:
4
+ - orai-nlp/ZelaiHandi
5
+ - HuggingFaceFW/fineweb
6
+ language:
7
+ - eu
8
+ base_model: meta-llama/Meta-Llama-3.1-8B
9
+ pipeline_tag: text-generation
10
  library_name: transformers
 
11
  ---
12
 
13
+ # Llama-eus-8B, a foundational sub-10 billion parameter LLM for Basque
14
 
15
+ Llama-eus-8B v1.0 is a foundational large language model (LLM) adapted from Meta's [Llama3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B), tailored specifically for the Basque language. Through continual pretraining on a combination of the [ZelaiHandi dataset](https://huggingface.co/datasets/orai-nlp/ZelaiHandi), containing approximately 1.5 billion high-quality Basque tokens, and a selected subset of the [FineWeb dataset](https://huggingface.co/datasets/HuggingFaceFW/fineweb), around 300 million tokens, Llama-eus-8B aims to enhance linguistic performance in Basque while maintaining general English capabilities.
16
 
17
+ The original Meta Llama 3.1 collection of models was trained on 15 trillion tokens, with some amount of multilingual content supporting 7 additional languages besides English: French, German, Hindi, Italian, Portuguese, Spanish and Thai. However, it has limitations for languages with less resources such as Basque, leading to grammatical inaccuracies and reduced fluency. To address this, Llama-eus-8B underwent specialized pretraining to improve understanding, coherence, and contextual relevance in Basque text, while also employing strategies to minimize catastrophic forgetting for English.
18
 
19
+ Evaluations show that Llama-eus-8B exhibits notable improvements in Basque on tasks requiring nuanced language understanding, cultural context awareness, and complex reasoning, with minimal degradation in performance for English.
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
+ ## Model Details
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
 
24
 
25
+ ### Model Description
26
 
27
+ - **Developed by:** [Orai NLP Technologies](https://huggingface.co/orai-nlp)
28
+ - **Model type:** Foundational LLM
29
+ - **License:** Llama 3.1 is [licensed](https://llama.meta.com/llama3_1/license/) under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.
30
+ - **Finetuned from model:** Built with Llama ([Llama3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B))
31
 
 
32
 
33
  ## Training Details
34
 
 
 
 
 
 
 
 
35
 
36
+ ### Training Data
37
 
38
+ For continual pre-training (CPT), we leveraged a combination of Basque and English data to enhance linguistic performance in Basque while maintaining general English capabilities. The goal is to improve cross-lingual transfer by retaining the model's proficiency in English.
39
 
40
+ - [**ZelaiHandi**](https://huggingface.co/datasets/orai-nlp/ZelaiHandi) (San Vicente et al., 2024): ZelaiHandi is the largest collection of freely licensed and high-quality Basque texts gathered from selected web sources. This collection comprises approximately 521 million words which correspond to 1.5 billion tokens (Llama 3.1 tokenizer).
41
 
42
+ - [**FineWeb**](https://huggingface.co/datasets/HuggingFaceFW/fineweb) (Penedo et al., 2024): FineWeb consists of more than 15T tokens of cleaned and deduplicated English web data from CommonCrawl. We selected a random subset of around 300 million tokens (Llama 3.1 tokenizer)
43
 
 
44
 
45
+ ### Training Procedure
46
 
47
+ Llama-eus-8B was trained within the 🤗 Transformers ecosystem, utilizing 🤗 Accelerate and DeepSpeed ZeRO for efficient large-scale model training. The process was conducted on the Hyperion system at the Donostia International Physics Center (DIPC), leveraging 8x NVIDIA A100 80GB SXM4 GPUs.
48
 
49
+ The model was trained with a sequence length of 4096 tokens and an effective batch size of approximately 2 million tokens, over 4 epochs, resulting in a total of around 7.2 billion tokens processed. A cosine learning rate schedule was used, with a peak learning rate of 1e-4 and a warm-up phase comprising 10% of the total steps. All remaining hyperparameters followed the configurations established by Touvron et al. (2023).
50
 
 
51
 
52
  ## Evaluation
53
 
54
+ ### Testing Data
 
 
55
 
56
+ To evaluate our model, we created Basque versions of well-established English benchmarks by manually translating a selected subset of these datasets. This approach enabled us to rigorously assess Llama-eus-8B's performance in Basque and directly compare it with its performance in English across various tasks, such as linguistic understanding, reasoning, and contextual comprehension, providing a comprehensive evaluation of the model's multilingual capabilities.
57
 
58
+ - **ARCEU** (Corral et al., 2024) [25-shot]: A subset of 250 samples manually translated to Basque from the [ARC dataset](https://huggingface.co/datasets/allenai/ai2_arc) (Clark et al., 2018). The corresponding 250 English samples are also provided. The ARC dataset consists of genuine grade-school level, multiple-choice science questions, assembled to encourage research in advanced question-answering.
59
 
60
+ - **WinoGrandeeu** (Corral et al., 2024) [5-shot]: A subset of 250 samples manually translated to Basque from the [WinoGrande dataset](https://huggingface.co/datasets/allenai/winogrande) (Sakaguchi et al., 2019). The corresponding 250 English samples are also provided. The Winogrande dataset is a collection of 44k problems, inspired by Winograd Schema Challenge, but adjusted to improve the scale and robustness against the dataset-specific bias. Formulated as a fill-in-a-blank task with binary options, the goal is to choose the right option for a given sentence which requires commonsense reasoning.
61
 
62
+ - **MMLUeu** (Corral et al., 2024) [5-shot]: A subset of 270 samples manually translated to Basque from the [MMLU dataset](https://huggingface.co/datasets/cais/mmlu) (Hendrycks et al., 2020). The corresponding 250 English samples are also provided. The MMLU dataset is a massive multitask test consisting of multiple-choice questions from various branches of knowledge. The test spans subjects in the humanities, social sciences, hard sciences, and other areas that are important for some people to learn.
63
 
64
+ - **HellaSwageu** (Corral et al., 2024) [10-shot]: A subset of 250 samples manually translated to Basque from the [HellaSwag dataset](https://huggingface.co/datasets/Rowan/hellaswag) (Zellers et al., 2019). The corresponding 250 English samples are also provided. The HellaSwag dataset is a dataset for commonsense NLI.
65
 
66
+ Additionally, we evaluated our model using a suite of already publicly available Basque Benchmarks:
67
 
68
+ - [**Belebele**](https://huggingface.co/datasets/facebook/belebele) (Bandarkar et al.) [5-shot]: Belebele is a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants.
69
 
70
+ - [**X-StoryCloze**](https://huggingface.co/datasets/juletxara/xstory_cloze) (Lin et al.) [0-shot]: XStoryCloze consists of the professionally translated version of the English StoryCloze dataset to 10 non-English languages. It is a commonsense reasoning framework for evaluating story understanding, story generation, and script learning
71
 
72
+ - [**BasqueGLUE**](https://huggingface.co/datasets/orai-nlp/basqueGLUE) (Urbizu et al.) [5-shot]: BasqueGLUE is a NLU benchmark for Basque, which has been elaborated from previously existing datasets and following similar criteria to those used for the construction of GLUE and SuperGLUE.
73
 
74
+ - [**EusProficiency**](https://huggingface.co/datasets/HiTZ/EusProficiency) (Etxaniz et al., 2024) [5-shot]: EusProficiency comprises 5,169 exercises on different topics from past EGA exams, the official C1-level certificate of proficiency in Basque.
75
 
76
+ - [**EusReading**](https://huggingface.co/datasets/HiTZ/EusReading) (Etxaniz et al., 2024) [1-shot]: EusReading consists of 352 reading comprehension exercises sourced from the set of past EGA (C1 Basque certificate) exams from 1998 to 2008.
77
 
78
+ - [**EusTrivia**](https://huggingface.co/datasets/HiTZ/EusTrivia) (Etxaniz et al., 2024) [5-shot]: EusTrivia consists of 1,715 trivia questions from multiple online sources. A significant portion of the questions focus specifically on the Basque Country, its language and culture.
79
 
80
+ - [**EusExams**](https://huggingface.co/datasets/HiTZ/EusExams) (Etxaniz et al., 2024) [5-shot]: EusExams is a collection of tests designed to prepare individuals for Public Service examinations conducted by several Basque institutions, including the public health system Osakidetza, the Basque Government, the City Councils of Bilbao and Gasteiz, and the University of the Basque Country (UPV/EHU).
81
 
82
 
83
+ ### Results
84
 
85
+ For the evaluation, we compare our model against the Latxa models (Etxaniz et al., 2024) to assess its performance and effectiveness in Basque language tasks.
86
+ Latxa is a family of large language models specifically developed for Basque, with parameter sizes ranging from 7 billion to 70 billion.
87
+ As the only existing models adapted to Basque, Latxa provides a valuable baseline for our comparison.
88
+
89
+ Additionally, we compare our model against Meta's Llama 3.1 models (Dubey et al., 2024), including the 8B and 70B versions.
90
+ Meta-Llama-3.1-8B model serves as the base model for our continual pre-training, providing a baseline for evaluating the improvements achieved through our approach.
91
+
92
+ Model evaluations were conducted with the [LM Evaluation Harness library](https://github.com/EleutherAI/lm-evaluation-harness) from Eleuther AI.
93
+
94
+ We divide the evaluation into sub-10 billion parameter models and over-10 billion parameter models to better understand the performance differences across various model sizes.
95
+ This distinction allows us for a fairer comparison of our model against both smaller and larger scale models.
96
+
97
+ <style>
98
+ .tb td{
99
+ text-align: center;
100
+ }
101
+ .tb th {
102
+ padding-top: .5714286em;
103
+ background-color: #E5E7E9;
104
+ }
105
+
106
+ .tb-eu-l10 tr:nth-child(2) {
107
+ border-bottom-width: 2px;
108
+ border-color: black;
109
+ }
110
+
111
+ .tb-en-l10 tr:nth-child(2) {
112
+ border-bottom-width: 2px;
113
+ border-color: black;
114
+ }
115
+
116
+ /*.tb-eu-g10 tr:nth-child(3) { background: #F1F3F4; }*/
117
+ .tb-eu-g10 tr:nth-child(3) td:nth-child(6) { background: #ABEBC6; }
118
+ .tb-eu-g10 tr:nth-child(3) td:nth-child(10) { background: #ABEBC6; }
119
+ .tb-eu-g10 tr:nth-child(3) td:nth-child(10) { background: #ABEBC6; }
120
+ .tb-eu-g10 tr:nth-child(3) td:nth-child(13) { background: #ABEBC6; }
121
+ .tb-eu-g10 tr:nth-child(3) td:nth-child(14) { background: #ABEBC6; }
122
+ .tb-eu-g10 tr:nth-child(3) td:nth-child(2) { background: #58D68D; }
123
+ .tb-eu-g10 tr:nth-child(3) td:nth-child(5) { background: #58D68D; }
124
+ .tb-eu-g10 tr:nth-child(3) td:nth-child(7) { background: #58D68D; }
125
+ .tb-eu-g10 tr:nth-child(3) td:nth-child(9) { background: #58D68D; }
126
+ .tb-eu-g10 tr:nth-child(3) td:nth-child(11) { background: #58D68D; }
127
+ .tb-eu-g10 tr:nth-child(3) {
128
+ border-bottom-width: 2px;
129
+ border-color: black;
130
+ }
131
+
132
+
133
+
134
+ /*.tb-en-g10 tr:nth-child(3) { background: #F1F3F4; }*/
135
+ .tb-en-g10 tr:nth-child(3) td:nth-child(2) { background: #ABEBC6; }
136
+ .tb-en-g10 tr:nth-child(3) td:nth-child(7) { background: #ABEBC6; }
137
+ .tb-en-g10 tr:nth-child(3) td:nth-child(4) { background: #58D68D; }
138
+ .tb-en-g10 tr:nth-child(3) td:nth-child(5) { background: #58D68D; }
139
+ .tb-en-g10 tr:nth-child(3) td:nth-child(6) { background: #58D68D; }
140
+ .tb-en-g10 tr:nth-child(3) td:nth-child(8) { background: #58D68D; }
141
+ .tb-en-g10 tr:nth-child(3) {
142
+ border-bottom-width: 2px;
143
+ border-color: black;
144
+ }
145
+ </style>
146
+
147
+
148
+ #### Sub-10 billion parameter results
149
+
150
+ Table 1 and Table 2 present the performance of sub-10 billion parameter models on both Basque and English benchhmarks.
151
+ We compare our Llama-eus-8B model with the Basque model latxa-7b-v1.2. We also report the results for the base model Meta-Llama-3.1-8B.
152
+
153
+ <div class="tb tb-eu-l10">
154
+
155
+ | Models | BL2MP | ARC | Winogrande | MMLU | HellaSwag | Belebele | X-StoryCloze | EusExams | EusProficiency | EusReading | EusTrivia | BasqueGLUE | Average |
156
+ |------------------------------|-------|-------|------------|-------|-----------|----------|--------------|----------|----------------|------------|-----------|------------|---------|
157
+ | latxa-7b-v1.2 | **89.33** | 54.80 | 65.60 | 34.44 | 61.20 | 37.33 | 65.45 | 33.82 | 30.26 | 26.99 | 42.16 | 52.56 | 49.50 |
158
+ | **Llama-eus-8B** | 89.22 | **55.20** | **67.20** | **53.33** | **63.60** | **73.44** | **65.72** | **52.51** | **48.44** | **54.55** | **56.21** | **55.27** | **61.22** |
159
+ | Meta-Llama-3.1-8B | 60.50 | 42.80 | 56.80 | 48.52 | 46.80 | 61.78 | 55.66 | 45.65 | 32.50 | 43.18 | 44.49 | 46.33 | 48.75 |
160
+
161
+ </div>
162
+ <div style="text-align: center; margin-top: -2em; font-style: italic;">
163
+
164
+ Table 1: **Performance on Basque** test sets for sub-10 billion parameter models. Best performing model is highlighted in bold.
165
+
166
+ </div>
167
+
168
+ Llama-eus-8B consistently outperforms the other two models across all test sets, with only a minor drop on BL2MP, achieving the highest average score of 61.22.
169
+ This highlights the effectiveness of our continual pre-training strategy, which significantly enhances Basque performance compared to the base model Meta-Llama-3.1-8B.
170
+
171
+
172
+ <div class="tb tb-en-l10">
173
+
174
+ | Models | ARC | Winogrande | MMLU | HellaSwag | Belebele | X-StoryCloze | Average |
175
+ |------------------------------|-------|------------|-------|-----------|----------|--------------|---------|
176
+ | latxa-7b-v1.2 | 61.20 | 75.60 | 38.15 | 76.40 | 41.56 | 73.66 | 61.10 |
177
+ | **Llama-eus-8B** | 67.60 | 78.40 | 62.59 | 86.40 | 84.67 | **78.49** | 76.36 |
178
+ | Meta-Llama-3.1-8B | **69.20** | **82.00** | **66.67** | **86.40** | **87.44** | 78.23 | **78.32** |
179
+
180
+ </div>
181
+ <div style="text-align: center; margin-top: -2em; font-style: italic;">
182
+
183
+ Table 2: **Performance on English** test sets for sub-10 billion parameter models. Best performing model is highlighted in bold.</div>
184
+
185
+ </div>
186
+
187
+ In English benchmarks, the Meta-Llama-3.1-8B model leads in most categories, showing strong overall performance.
188
+ However, Llama-eus-8B performs notably well with only a 2 point decrease on average, highlighting effectiveness of performing continual pre-training with Basque and English data to avoid catastrophic forgetting.
189
+
190
+
191
+ #### Over-10 billion parameter results
192
+
193
+ Table 3 and Table 4 present the performance our Llama-eus-8B model againts over-10 billion parameter models on both Basque and English benchhmarks.
194
+ We compare our Llama-eus-8B model with the 13B and 70B versions of Latxa and with the 70B version of Meta's Llama 3.1.
195
+
196
+ <div class="tb tb-eu-g10">
197
+
198
+ | Models | BL2MP | ARC | Winogrande | MMLU | HellaSwag | Belebele | X-StoryCloze | EusExams | EusProficiency | EusReading | EusTrivia | BasqueGLUE | Average |
199
+ |-------------------------------|--------|-------|------------|--------|-----------|----------|--------------|----------|----------------|------------|-----------|------------|---------|
200
+ | latxa-13b-v1.2 | 88.67 | 55.60 | 69.60 | 39.63 | 61.60 | 53.89 | 66.51 | 43.66 | 44.11 | 34.94 | 56.38 | 53.36 | 55.66 |
201
+ | latxa-70b-v1.2 | 88.72 | 64.80 | **72.80** | 47.78 | **67.20** | 71.67 | **70.55** | 51.90 | **60.65** | 52.27 | **62.45** | 59.74 | 64.21 |
202
+ | **Llama-eus-8B** | **89.22** | 55.20 | 67.20 | 53.33 | 63.60 | 73.44 | 65.72 | 52.51 | 48.44 | 54.55 | 56.21 | 55.27 | 61.22 |
203
+ | Meta-Llama-3.1-70B | 67.89 | **67.20** | 70.00 | **63.70** | 63.60 | **87.67** | 65.98 | **64.62** | 44.86 | **72.44** | 60.23 | **63.50** | **65.97** |
204
+
205
+ </div>
206
+ <div style="text-align: center; margin-top: -2em; font-style: italic;">
207
+
208
+ Table 3: **Performance on Basque** test sets for over-10 billion parameter models. Best performing model is highlighted in bold. Light green indicates that Llama-eus-8B surpasses the 13B model while dark green highlights that Llama-eus-8B outperforms both Basque-adapted systems (13B and 70B).</div>
209
+
210
+ </div>
211
+
212
+ Table 3 shows that Llama-eus-8B outperforms the Latxa-13B model and performs competitively with the Latxa-70B model across various Basque benchmarks.
213
+ While the Latxa-70B model excels in several categories, particularly in Basque-specific tasks, Llama-eus-8B still achieves a high average score of 61.22 even with fewer parameters.
214
+ This indicates that the trade-off between parameter size and performance is significant, with Llama-eus-8B providing strong performance without requiring the largest model size.
215
+
216
+ <div class="tb tb-en-g10">
217
+
218
+ | Models | ARC | Winogrande | MMLU | HellaSwag | Belebele | X-StoryCloze | Average |
219
+ |-------------------------------|--------|------------|---------|-----------|----------|--------------|---------|
220
+ | latxa-13b-v1.2 | 66.80 | 80.80 | 47.41 | 83.20 | 63.44 | 76.51 | 69.69 |
221
+ | latxa-70b-v1.2 | 70.00 | 84.80 | 51.48 | 86.00 | 81.78 | 78.76 | 75.47 |
222
+ | **Llama-eus-8B** | 67.60 | 78.40 | 62.59 | 86.40 | 84.67 | 78.49 | 76.36 |
223
+ | Meta-Llama-3.1-70B | **78.40** | **85.60** | **72.22** | **92.00** | **94.44** | **81.01** | **83.95** |
224
+ </div>
225
+ <div style="text-align: center; margin-top: -2em; font-style: italic;">
226
+
227
+ Table 4: **Performance on English** test sets for over-10 billion parameter models. Best performing model is highlighted in bold. Light green indicates that Llama-eus-8B surpasses the 13B model while dark green highlights that Llama-eus-8B outperforms both Basque-adapted systems (13B and 70B).</div>
228
+
229
+ </div>
230
+
231
+ Table 4 reveals that the Meta-Llama-3.1-70B model leads in English benchmarks, achieving the highest average score of 83.95.
232
+ The larger parameter size of Meta-Llama-3.1-70B contributes to its superior performance across most English tasks.
233
+ Llama-eus-8B competes closely with the larger Latxa models despite having fewer parameters.
234
 
 
235
 
236
  ## Environmental Impact
237
 
238
+ Carbon emissions were estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
239
 
240
+ - **Hardware Type:** 8x NVIDIA A100 80GB SXM4
241
+ - **Hours used:** 561.4 GPU hours
242
+ - **Hardware Provider:** Donostia International Physics Center (DIPC)
243
+ - **Compute Region:** Spain
244
+ - **Carbon Emitted:** 97.01 kg C02 eq
245
 
 
246
 
247
+ ## License
248
 
249
+ Llama 3.1 is [licensed](https://llama.meta.com/llama3_1/license/) under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.
250
 
 
251
 
252
+ ## Acknowledgments
253
 
254
+ This work is part of the BasqueLLM project, titled "First steps towards an artificial intelligence in Basque based on LLMs" (EXP: 2023-CIEN-000081-01), partially funded by the Guipuzcoa Science, Technology and Innovation Network Program of the Provincial Council of Gipuzkoa. Model training and development were conducted using the Hyperion system at the Donostia International Physics Center (DIPC).
255
 
 
256
 
257
+ ## Citation
258
 
259
+ If you use Llama-eus-8B please cite the following reference:
260
 
261
+ ```bibtex
262
+ @misc{Llama-eus,
263
+ title = {Llama-eus-8B, a foundational sub-10 billion parameter LLM for Basque},
264
+ author = {Ander Corral, Ixak Sarasua and Xabier Saralegi},
265
+ publisher = {Orai NLP Technologies},
266
+ url = {\url{https://huggingface.co/datasets/orai-nlp/Llama-eus-8B}},
267
+ year = 2024 }
268
+ ```
269
 
270
+ ## Contact
271
 
272
+ - Ander Corral ([email protected])
273
+ - Xabier Saralegi ([email protected])