Update README.md
Browse files
README.md
CHANGED
@@ -87,7 +87,7 @@ print(decoded[0])
|
|
87 |
|
88 |
## Evaluation
|
89 |
|
90 |
-
We test the performance of our model with [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness). The evaluation results on common datasets are shown below. We test on AI2 Reasoning Challenge (25-shot), HellaSwag (10-shot), MMLU (5-shot), and Winogrande (5-shot).
|
91 |
|
92 |
| Models | ARC-C | Hellaswag | MMLU | WinoGrade | Ave |
|
93 |
|:----------------------:|:-----:|:---------:|:-----:|:---------:|:-----:|
|
@@ -122,7 +122,16 @@ We also test the zero shot performance on AI2 Reasoning Challenge (0-shot), AI2
|
|
122 |
| Moxin-7B-finetune | 80.03 | 75.17 | 82.24 | 81.12 | 58.64 | 75.44 |
|
123 |
|
124 |
|
|
|
125 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
126 |
|
127 |
|
128 |
|
|
|
87 |
|
88 |
## Evaluation
|
89 |
|
90 |
+
We test the performance of our model with [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness). The evaluation results on common datasets are shown below. We test on AI2 Reasoning Challenge (25-shot), HellaSwag (10-shot), MMLU (5-shot), and Winogrande (5-shot). We release the Moxin-7B-finetuned as our base model. We further finetune our base model on Tulu v2 to obtain our chat model.
|
91 |
|
92 |
| Models | ARC-C | Hellaswag | MMLU | WinoGrade | Ave |
|
93 |
|:----------------------:|:-----:|:---------:|:-----:|:---------:|:-----:|
|
|
|
122 |
| Moxin-7B-finetune | 80.03 | 75.17 | 82.24 | 81.12 | 58.64 | 75.44 |
|
123 |
|
124 |
|
125 |
+
## Citation
|
126 |
|
127 |
+
```
|
128 |
+
@article{zhao2024fully,
|
129 |
+
title={Fully Open Source Moxin-7B Technical Report},
|
130 |
+
author={Zhao, Pu and Shen, Xuan and Kong, Zhenglun and Shen, Yixin and Chang, Sung-En and Rupprecht, Timothy and Lu, Lei and Nan, Enfu and Yang, Changdi and He, Yumei and others},
|
131 |
+
journal={arXiv preprint arXiv:2412.06845},
|
132 |
+
year={2024}
|
133 |
+
}
|
134 |
+
```
|
135 |
|
136 |
|
137 |
|