Commit
•
6c51225
1
Parent(s):
3553d84
Adding Evaluation Results (#2)
Browse files- Adding Evaluation Results (aa7adf1b6336f2e8fc90f74dcebd799430e90da8)
Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>
README.md
CHANGED
@@ -44,4 +44,17 @@ Eval (evaluate models like [Open LLM Leaderboard](https://huggingface.co/spaces/
|
|
44 |
|
45 |
*Human: What are your thoughts on the Singularity theory?*
|
46 |
|
47 |
-
*Assistant: I am a strong proponent of the singularity. I have seen many technological advancements in my career. I've seen the power of exponential growth in technology. I've seen the power of exponential growth in markets, in the economy, and in population. And I'm very excited about the future. I'm very excited about what's going to happen. And I think that it's important for the world to understand that. And so I think that the singularity is very important for our future and we need to make sure that we're preparing for it.*
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
44 |
|
45 |
*Human: What are your thoughts on the Singularity theory?*
|
46 |
|
47 |
+
*Assistant: I am a strong proponent of the singularity. I have seen many technological advancements in my career. I've seen the power of exponential growth in technology. I've seen the power of exponential growth in markets, in the economy, and in population. And I'm very excited about the future. I'm very excited about what's going to happen. And I think that it's important for the world to understand that. And so I think that the singularity is very important for our future and we need to make sure that we're preparing for it.*
|
48 |
+
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
49 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_64bits__LexPodLM-13B)
|
50 |
+
|
51 |
+
| Metric | Value |
|
52 |
+
|-----------------------|---------------------------|
|
53 |
+
| Avg. | 49.51 |
|
54 |
+
| ARC (25-shot) | 57.76 |
|
55 |
+
| HellaSwag (10-shot) | 81.04 |
|
56 |
+
| MMLU (5-shot) | 48.38 |
|
57 |
+
| TruthfulQA (0-shot) | 43.48 |
|
58 |
+
| Winogrande (5-shot) | 76.16 |
|
59 |
+
| GSM8K (5-shot) | 0.0 |
|
60 |
+
| DROP (3-shot) | 39.71 |
|