Here's a "continued pre-trained" model using Finnish Wikipedia dataset. I still don't understand why no one in Finland has figured out that they could just do continued pre-training on existing models that are already supported by every frontend.. I've seen Japanese models perform pretty well with that kind of continued pre-training, yet Finnish models are still done from scratch which means they suck ass. If you compare them to Llama 3 or Gemma 2 they just suck so much. They can't even match Mistral 7B a model from last year. Just stop wasting money on training models from scratch, use these better models as base and train it on all your closed-source data I don't have access to. Thank you.

LoRA: mpasila/Llama-3.2-Finnish-Wikipedia-LoRA-1B

Trained with regular LoRA (not quantized/QLoRA) and LoRA rank was 128 and Alpha set to 32. Trained for 1 epoch using RTX 4090 for about 12,5 hours.

So it does have some issues but I could try training it on Gemma 2 2B and see if that's a better model for this (Gemma 2 already is better at Finnish than Llama 3) and maybe add more datasets containing Finnish.

Evaluation

Model Size Type FIN-bench (score) Without math
mpasila/Llama-3.2-Finnish-Wikipedia-1B 1B Base 0.3170 0.4062
unsloth/Llama-3.2-1B 1B Base 0.4029 0.3881
Finnish-NLP/llama-7b-finnish 7B Base 0.2350 0.4203
LumiOpen/Viking-7B (1000B) 7B Base 0.3721 0.4453
HPLT/gpt-7b-nordic-prerelease 7B Base 0.3169 0.4524

Source

FIN-bench scores:

Task Version Metric Value Stderr
bigbench_analogies 0 multiple_choice_grade 0.4846 ± 0.0440
bigbench_arithmetic_1_digit_addition 0 multiple_choice_grade 0.0300 ± 0.0171
bigbench_arithmetic_1_digit_division 0 multiple_choice_grade 0.0435 ± 0.0435
bigbench_arithmetic_1_digit_multiplication 0 multiple_choice_grade 0.0200 ± 0.0141
bigbench_arithmetic_1_digit_subtraction 0 multiple_choice_grade 0.0700 ± 0.0256
bigbench_arithmetic_2_digit_addition 0 multiple_choice_grade 0.2200 ± 0.0416
bigbench_arithmetic_2_digit_division 0 multiple_choice_grade 0.0800 ± 0.0273
bigbench_arithmetic_2_digit_multiplication 0 multiple_choice_grade 0.2400 ± 0.0429
bigbench_arithmetic_2_digit_subtraction 0 multiple_choice_grade 0.1800 ± 0.0386
bigbench_arithmetic_3_digit_addition 0 multiple_choice_grade 0.3300 ± 0.0473
bigbench_arithmetic_3_digit_division 0 multiple_choice_grade 0.2100 ± 0.0409
bigbench_arithmetic_3_digit_multiplication 0 multiple_choice_grade 0.3000 ± 0.0461
bigbench_arithmetic_3_digit_subtraction 0 multiple_choice_grade 0.5500 ± 0.0500
bigbench_arithmetic_4_digit_addition 0 multiple_choice_grade 0.2800 ± 0.0451
bigbench_arithmetic_4_digit_division 0 multiple_choice_grade 0.2500 ± 0.0435
bigbench_arithmetic_4_digit_multiplication 0 multiple_choice_grade 0.1500 ± 0.0359
bigbench_arithmetic_4_digit_subtraction 0 multiple_choice_grade 0.4400 ± 0.0499
bigbench_arithmetic_5_digit_addition 0 multiple_choice_grade 0.5100 ± 0.0502
bigbench_arithmetic_5_digit_division 0 multiple_choice_grade 0.3000 ± 0.0461
bigbench_arithmetic_5_digit_multiplication 0 multiple_choice_grade 0.3100 ± 0.0465
bigbench_arithmetic_5_digit_subtraction 0 multiple_choice_grade 0.4000 ± 0.0492
bigbench_cause_and_effect_one_sentence 0 multiple_choice_grade 0.5882 ± 0.0696
bigbench_cause_and_effect_one_sentence_no_prompt 0 multiple_choice_grade 0.3922 ± 0.0690
bigbench_cause_and_effect_two_sentences 0 multiple_choice_grade 0.4510 ± 0.0704
bigbench_emotions 0 multiple_choice_grade 0.1938 ± 0.0313
bigbench_empirical_judgments 0 multiple_choice_grade 0.3434 ± 0.0480
bigbench_general_knowledge 0 multiple_choice_grade 0.2714 ± 0.0535
bigbench_hhh_alignment_harmless 0 multiple_choice_grade 0.3966 ± 0.0648
bigbench_hhh_alignment_helpful 0 multiple_choice_grade 0.3729 ± 0.0635
bigbench_hhh_alignment_honest 0 multiple_choice_grade 0.3390 ± 0.0622
bigbench_hhh_alignment_other 0 multiple_choice_grade 0.5581 ± 0.0766
bigbench_intent_recognition 0 multiple_choice_grade 0.0925 ± 0.0110
bigbench_misconceptions 0 multiple_choice_grade 0.4403 ± 0.0430
bigbench_paraphrase 0 multiple_choice_grade 0.5000 ± 0.0354
bigbench_sentence_ambiguity 0 multiple_choice_grade 0.4833 ± 0.0651
bigbench_similarities_abstraction 0 multiple_choice_grade 0.5921 ± 0.0567

Uploaded Llama-3.2-Finnish-Wikipedia-1B model

  • Developed by: mpasila
  • License: Llama 3.2 Community License Agreement
  • Finetuned from model : unsloth/Llama-3.2-1B

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month
263
Safetensors
Model size
1.5B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for mpasila/Llama-3.2-Finnish-Wikipedia-1B

Finetuned
(20)
this model
Quantizations
1 model

Dataset used to train mpasila/Llama-3.2-Finnish-Wikipedia-1B