Here's a "continued pre-trained" model using Finnish Wikipedia dataset. I still don't understand why no one in Finland has figured out that they could just do continued pre-training on existing models that are already supported by every frontend.. I've seen Japanese models perform pretty well with that kind of continued pre-training, yet Finnish models are still done from scratch which means they suck ass. If you compare them to Llama 3 or Gemma 2 they just suck so much. They can't even match Mistral 7B a model from last year. Just stop wasting money on training models from scratch, use these better models as base and train it on all your closed-source data I don't have access to. Thank you.

LoRA: mpasila/Llama-3.2-Finnish-Wikipedia-LoRA-1B

Trained with regular LoRA (not quantized/QLoRA) and LoRA rank was 128 and Alpha set to 32. Trained for 1 epoch using RTX 4090 for about 12,5 hours.

So it does have some issues but I could try training it on Gemma 2 2B and see if that's a better model for this (Gemma 2 already is better at Finnish than Llama 3) and maybe add more datasets containing Finnish.

Evaluation

Model	Size	Type	FIN-bench (score)	Without math
mpasila/Llama-3.2-Finnish-Wikipedia-1B	1B	Base	0.3170	0.4062
unsloth/Llama-3.2-1B	1B	Base	0.4029	0.3881
Finnish-NLP/llama-7b-finnish	7B	Base	0.2350	0.4203
LumiOpen/Viking-7B (1000B)	7B	Base	0.3721	0.4453
HPLT/gpt-7b-nordic-prerelease	7B	Base	0.3169	0.4524

Source

FIN-bench scores:

Task	Metric	Value		Stderr
bigbench_analogies	multiple_choice_grade	0.4846	±	0.0440
bigbench_arithmetic_1_digit_addition	multiple_choice_grade	0.0300	±	0.0171
bigbench_arithmetic_1_digit_division	multiple_choice_grade	0.0435	±	0.0435
bigbench_arithmetic_1_digit_multiplication	multiple_choice_grade	0.0200	±	0.0141
bigbench_arithmetic_1_digit_subtraction	multiple_choice_grade	0.0700	±	0.0256
bigbench_arithmetic_2_digit_addition	multiple_choice_grade	0.2200	±	0.0416
bigbench_arithmetic_2_digit_division	multiple_choice_grade	0.0800	±	0.0273
bigbench_arithmetic_2_digit_multiplication	multiple_choice_grade	0.2400	±	0.0429
bigbench_arithmetic_2_digit_subtraction	multiple_choice_grade	0.1800	±	0.0386
bigbench_arithmetic_3_digit_addition	multiple_choice_grade	0.3300	±	0.0473
bigbench_arithmetic_3_digit_division	multiple_choice_grade	0.2100	±	0.0409
bigbench_arithmetic_3_digit_multiplication	multiple_choice_grade	0.3000	±	0.0461
bigbench_arithmetic_3_digit_subtraction	multiple_choice_grade	0.5500	±	0.0500
bigbench_arithmetic_4_digit_addition	multiple_choice_grade	0.2800	±	0.0451
bigbench_arithmetic_4_digit_division	multiple_choice_grade	0.2500	±	0.0435
bigbench_arithmetic_4_digit_multiplication	multiple_choice_grade	0.1500	±	0.0359
bigbench_arithmetic_4_digit_subtraction	multiple_choice_grade	0.4400	±	0.0499
bigbench_arithmetic_5_digit_addition	multiple_choice_grade	0.5100	±	0.0502
bigbench_arithmetic_5_digit_division	multiple_choice_grade	0.3000	±	0.0461
bigbench_arithmetic_5_digit_multiplication	multiple_choice_grade	0.3100	±	0.0465
bigbench_arithmetic_5_digit_subtraction	multiple_choice_grade	0.4000	±	0.0492
bigbench_cause_and_effect_one_sentence	multiple_choice_grade	0.5882	±	0.0696
bigbench_cause_and_effect_one_sentence_no_prompt	multiple_choice_grade	0.3922	±	0.0690
bigbench_cause_and_effect_two_sentences	multiple_choice_grade	0.4510	±	0.0704
bigbench_emotions	multiple_choice_grade	0.1938	±	0.0313
bigbench_empirical_judgments	multiple_choice_grade	0.3434	±	0.0480
bigbench_general_knowledge	multiple_choice_grade	0.2714	±	0.0535
bigbench_hhh_alignment_harmless	multiple_choice_grade	0.3966	±	0.0648
bigbench_hhh_alignment_helpful	multiple_choice_grade	0.3729	±	0.0635
bigbench_hhh_alignment_honest	multiple_choice_grade	0.3390	±	0.0622
bigbench_hhh_alignment_other	multiple_choice_grade	0.5581	±	0.0766
bigbench_intent_recognition	multiple_choice_grade	0.0925	±	0.0110
bigbench_misconceptions	multiple_choice_grade	0.4403	±	0.0430
bigbench_paraphrase	multiple_choice_grade	0.5000	±	0.0354
bigbench_sentence_ambiguity	multiple_choice_grade	0.4833	±	0.0651
bigbench_similarities_abstraction	multiple_choice_grade	0.5921	±	0.0567

Uploaded Llama-3.2-Finnish-Wikipedia-1B model

Developed by: mpasila
License: Llama 3.2 Community License Agreement
Finetuned from model : unsloth/Llama-3.2-1B

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

mpasila
/

Llama-3.2-Finnish-Wikipedia-1B

Evaluation

FIN-bench scores:

Uploaded Llama-3.2-Finnish-Wikipedia-1B model

Model tree for mpasila/Llama-3.2-Finnish-Wikipedia-1B

Dataset used to train mpasila/Llama-3.2-Finnish-Wikipedia-1B