mon_nllb_1.3B

This model is a fine-tuned version of facebook/nllb-200-distilled-1.3B on an unknown dataset. It achieves the following results on the evaluation set:

BLEU: 44.06
chrF: 44.43
METEOR: 0.537

Example Usage

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

model_name = "Billyyy/mon_nllb_1.3B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

text = "Сайн байна уу"?"
inputs = tokenizer(text, return_tensors="pt")

output_tokens = model.generate(**inputs)
translated_text = tokenizer.decode(output_tokens[0], skip_special_tokens=True)

print(translated_text)

Model description

This model was finetuned on Mongolian->English parallel dataset with LoRA

Training and evaluation data

Training data:

1M translation data from https://github.com/sharavsambuu/english-mongolian-nmt-dataset-augmentation?tab=readme-ov-file
OpenSubtitles
TED2020

Evaluation data:

FLORES-200

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 40
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 160
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 500
num_epochs: 2
mixed_precision_training: FP16

Training results

Training Loss	Epoch	Step	Validation Loss
7.3708	0.1522	1000	7.2420
7.25	0.3044	2000	7.2126
7.237	0.4567	3000	7.2120
7.2344	0.6089	4000	7.2137
7.2323	0.7611	5000	7.2130
7.2351	0.9133	6000	7.2121
7.222	1.0656	7000	7.2131
7.22	1.2178	8000	7.2122
7.2077	1.3700	9000	7.2131
7.2132	1.5223	10000	7.2132
7.2211	1.6745	11000	7.2128
7.2269	1.8267	12000	7.2131
7.2296	1.9789	13000	7.2132

Framework versions

PEFT 0.14.0
Transformers 4.49.0
Pytorch 2.6.0+cu124
Datasets 3.3.2
Tokenizers 0.21.0

Billyyy
/

mon_nllb_1.3B