Update README.md
Browse files
README.md
CHANGED
@@ -137,9 +137,9 @@ This example demonstrates how to load the model and tokenizer, prepare input, ge
|
|
137 |
|
138 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
139 |
|
140 |
-
[Teuken-7B-
|
141 |
The pretraining data has a cutoff of September 2023.
|
142 |
-
More information
|
143 |
|
144 |
|
145 |
### Instruction-Tuning Data
|
|
|
137 |
|
138 |
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
139 |
|
140 |
+
[Teuken-7B-instruct-research-v0.4](https://huggingface.co/openGPT-X/Teuken-7B-instruct-research-v0.4) was pre-trained on 4 trillion tokens of data from publicly available sources.
|
141 |
The pretraining data has a cutoff of September 2023.
|
142 |
+
More information is available in our [preprint "Data Processing for the OpenGPT-X Model Family"](http://arxiv.org/abs/2410.08800).
|
143 |
|
144 |
|
145 |
### Instruction-Tuning Data
|