Adding Evaluation Results

This is an automated PR created with https://huggingface.co./spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co./spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show

README.md +140 -8

README.md CHANGED Viewed

@@ -1,15 +1,133 @@
 ---
 license: apache-2.0
 widget:
-  - example_title: "Yi-34B-Chat"
-    text: "hi"
-    output:
-      text: " Hello! How can I assist you today?"
-  - example_title: "Yi-34B"
-    text: "There's a place where time stands still. A place of breath taking wonder, but also"
-    output:
-      text: " an eerie sense that something is just not right…\nBetween the two worlds lies The Forgotten Kingdom - home to creatures long since thought extinct and ancient magic so strong it defies belief! Only here can you find what has been lost for centuries: An Elixir Of Life which will restore youth and vitality if only those who seek its power are brave enough to face up against all manner of dangers lurking in this mysterious land! But beware; some say there may even exist powerful entities beyond our comprehension whose intentions towards humanity remain unclear at best ---- they might want nothing more than destruction itself rather then anything else from their quest after immortality (and maybe someone should tell them about modern medicine)? In any event though – one thing remains true regardless : whether or not success comes easy depends entirely upon how much effort we put into conquering whatever challenges lie ahead along with having faith deep down inside ourselves too ;) So let’s get started now shall We?"
 pipeline_tag: text-generation
 ---
 <div align="center">
@@ -1410,3 +1528,17 @@ The code and weights of the Yi series models are distributed under the [Apache 2
 <p align="right"> [
   <a href="#top">Back to top ⬆️ </a>  ]
 </p>

 ---
 license: apache-2.0
 widget:
+- example_title: Yi-34B-Chat
+  text: hi
+  output:
+    text: ' Hello! How can I assist you today?'
+- example_title: Yi-34B
+  text: There's a place where time stands still. A place of breath taking wonder,
+    but also
+  output:
+    text: ' an eerie sense that something is just not right…
+      Between the two worlds lies The Forgotten Kingdom - home to creatures long since
+      thought extinct and ancient magic so strong it defies belief! Only here can
+      you find what has been lost for centuries: An Elixir Of Life which will restore
+      youth and vitality if only those who seek its power are brave enough to face
+      up against all manner of dangers lurking in this mysterious land! But beware;
+      some say there may even exist powerful entities beyond our comprehension whose
+      intentions towards humanity remain unclear at best ---- they might want nothing
+      more than destruction itself rather then anything else from their quest after
+      immortality (and maybe someone should tell them about modern medicine)? In any
+      event though – one thing remains true regardless : whether or not success comes
+      easy depends entirely upon how much effort we put into conquering whatever challenges
+      lie ahead along with having faith deep down inside ourselves too ;) So let’s
+      get started now shall We?'
 pipeline_tag: text-generation
+model-index:
+- name: Yi-9B
+  results:
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: AI2 Reasoning Challenge (25-Shot)
+      type: ai2_arc
+      config: ARC-Challenge
+      split: test
+      args:
+        num_few_shot: 25
+    metrics:
+    - type: acc_norm
+      value: 61.18
+      name: normalized accuracy
+    source:
+      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=01-ai/Yi-9B
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: HellaSwag (10-Shot)
+      type: hellaswag
+      split: validation
+      args:
+        num_few_shot: 10
+    metrics:
+    - type: acc_norm
+      value: 78.82
+      name: normalized accuracy
+    source:
+      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=01-ai/Yi-9B
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: MMLU (5-Shot)
+      type: cais/mmlu
+      config: all
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 70.06
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=01-ai/Yi-9B
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: TruthfulQA (0-shot)
+      type: truthful_qa
+      config: multiple_choice
+      split: validation
+      args:
+        num_few_shot: 0
+    metrics:
+    - type: mc2
+      value: 42.45
+    source:
+      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=01-ai/Yi-9B
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: Winogrande (5-shot)
+      type: winogrande
+      config: winogrande_xl
+      split: validation
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 77.51
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=01-ai/Yi-9B
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: GSM8k (5-shot)
+      type: gsm8k
+      config: main
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 48.98
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=01-ai/Yi-9B
+      name: Open LLM Leaderboard
 ---
 <div align="center">
 <p align="right"> [
   <a href="#top">Back to top ⬆️ </a>  ]
 </p>
+# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_01-ai__Yi-9B)
+|             Metric              |Value|
+|---------------------------------|----:|
+|Avg.                             |63.17|
+|AI2 Reasoning Challenge (25-Shot)|61.18|
+|HellaSwag (10-Shot)              |78.82|
+|MMLU (5-Shot)                    |70.06|
+|TruthfulQA (0-shot)              |42.45|
+|Winogrande (5-shot)              |77.51|
+|GSM8k (5-shot)                   |48.98|