Update README.md
Browse files
README.md
CHANGED
@@ -40,20 +40,18 @@ And two instruct-tuned versions:
|
|
40 |
|
41 |
## Intended uses & limitations
|
42 |
|
43 |
-
This model was trained only with Finnish texts excluding code so it should not be used for multilingual and code generation use cases.
|
44 |
-
|
45 |
This model was pretrained only in a self-supervised way, without any supervised training. You can use this model for text generation or fine-tune it for a downstream task. This model followed a 2-stage pretraining approach where single-turn instruction-following examples were mixed in with the other training data in the second stage (explained more later in this readme). Thanks to this approach, this pretrained model is already capable of instruction following, but you might get even better results if you specifically fine-tune it for instruction following or other use cases. For instruction-following fine-tuning, you should use the same prompt format showcased below.
|
46 |
|
47 |
### How to use
|
48 |
|
49 |
-
|
50 |
|
51 |
We have now added finetuning example notebook along with video! \
|
52 |
Notebook: https://huggingface.co/Finnish-NLP/Ahma-3B/blob/main/Finetune_Ahma_3B_example.ipynb \
|
53 |
Video: https://www.youtube.com/watch?v=6mbgn9XzpS4
|
54 |
|
55 |
|
56 |
-
|
57 |
|
58 |
If you want to use this model for instruction-following, you need to use the same prompt format we used in the second stage of the pretraining (basically the same format what Meta used in their Llama2 models). **Note: do not use "LlamaTokenizer" from transformers library but always use the AutoTokenizer instead, or use the plain sentencepiece tokenizer.** Here is an example using the instruction-following prompt format, with some generation arguments you can modify for your use:
|
59 |
|
@@ -110,6 +108,8 @@ You may experiment with different system prompt instructions too if you like.
|
|
110 |
|
111 |
### Limitations and bias
|
112 |
|
|
|
|
|
113 |
The training data used for this model contains a lot of content from the internet, which is far from neutral. Therefore, the model can have biased predictions. This bias will also affect all fine-tuned versions of this model.
|
114 |
|
115 |
To reduce toxic content, training data was filtered with a toxicity classifier but it cannot truly eliminate all toxic text.
|
|
|
40 |
|
41 |
## Intended uses & limitations
|
42 |
|
|
|
|
|
43 |
This model was pretrained only in a self-supervised way, without any supervised training. You can use this model for text generation or fine-tune it for a downstream task. This model followed a 2-stage pretraining approach where single-turn instruction-following examples were mixed in with the other training data in the second stage (explained more later in this readme). Thanks to this approach, this pretrained model is already capable of instruction following, but you might get even better results if you specifically fine-tune it for instruction following or other use cases. For instruction-following fine-tuning, you should use the same prompt format showcased below.
|
44 |
|
45 |
### How to use
|
46 |
|
47 |
+
#### Fine-tuning
|
48 |
|
49 |
We have now added finetuning example notebook along with video! \
|
50 |
Notebook: https://huggingface.co/Finnish-NLP/Ahma-3B/blob/main/Finetune_Ahma_3B_example.ipynb \
|
51 |
Video: https://www.youtube.com/watch?v=6mbgn9XzpS4
|
52 |
|
53 |
|
54 |
+
#### Inference
|
55 |
|
56 |
If you want to use this model for instruction-following, you need to use the same prompt format we used in the second stage of the pretraining (basically the same format what Meta used in their Llama2 models). **Note: do not use "LlamaTokenizer" from transformers library but always use the AutoTokenizer instead, or use the plain sentencepiece tokenizer.** Here is an example using the instruction-following prompt format, with some generation arguments you can modify for your use:
|
57 |
|
|
|
108 |
|
109 |
### Limitations and bias
|
110 |
|
111 |
+
This model was trained only with Finnish texts excluding code so it should not be used for multilingual and code generation use cases.
|
112 |
+
|
113 |
The training data used for this model contains a lot of content from the internet, which is far from neutral. Therefore, the model can have biased predictions. This bias will also affect all fine-tuned versions of this model.
|
114 |
|
115 |
To reduce toxic content, training data was filtered with a toxicity classifier but it cannot truly eliminate all toxic text.
|