openGPT-X
/

Teuken-7B-instruct-research-v0.4

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

mfromm commited on Nov 25, 2024

Commit

f8ed8af

·

verified ·

1 Parent(s): 6836a9a

Update README.md

Files changed (1) hide show

README.md +14 -7

README.md CHANGED Viewed

@@ -202,15 +202,22 @@ More information regarding the pre-training are available in our model preprint
 <!-- This section describes the evaluation protocols and provides the results. -->
-More information regarding our translated benchmarks are available in our preprint ["Towards Multilingual LLM Evaluation for European Languages"](https://arxiv.org/abs/2410.08928).
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-The model was evaluated in 21 languages on ARC, GSM8K, HellaSwag, TruthfulQA, Translation and MMLU. Results can be seen in the [European LLM Leaderboard](https://huggingface.co/spaces/openGPT-X/european-llm-leaderboard).
 ## Technical Specifications

 <!-- This section describes the evaluation protocols and provides the results. -->
+Results on multilingual benchmarks for 21 European languages with instruction-tuned models
+| Model                          | Avg.   | EU21-ARC | EU21-HeSw | EU21-TQA | EU21-MMLU |
+|--------------------------------|--------|----------|-----------|----------|-----------|
+| Meta-Llama-3.1-8B-Instruct     | **.563** | .563    | .579      | .532     | **.576**  |
+| Mistral-7B-Instruct-v0.3       | .527   | .530    | .538      | **.548** | _ .491 _   |
+| Salamandra-7B-Instruct         | _ .543 _ | **.595** | **.637**  | .482     | .459      |
+| Aya-23-8B                      | .485   | .475    | .535      | .476     | .455      |
+| Occiglot-7B-eu5-Instruct       | .475   | .484    | .519      | .471     | .428      |
+| Pharia-1-LLM-7B-C-A            | .417   | .396    | .438      | .469     | .366      |
+| Bloomz-7B1                     | .358   | .316    | .354      | .461     | .302      |
+| **Ours (Base)**                | .496   | .550    | .615      | .469     | .349      |
+| **Ours (Instruct)**            | _ .543 _ | _ .581 _ | _ .624 _  | _ .543 _ | .425      |
+More information regarding our translated benchmarks are available in our preprint ["Towards Multilingual LLM Evaluation for European Languages"](https://arxiv.org/abs/2410.08928).
+The model was evaluated in 21 languages on ARC, GSM8K, HellaSwag, TruthfulQA, Translation and MMLU. Results can also be seen in the [European LLM Leaderboard](https://huggingface.co/spaces/openGPT-X/european-llm-leaderboard).
 ## Technical Specifications