mon_nllb_3B_r32

This model is a fine-tuned version of facebook/nllb-200-distilled-1.3B on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 7.2132

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 40
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 160
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 500
num_epochs: 2
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
7.4511	0.0761	500	7.2785
7.3373	0.1522	1000	7.2305
7.2568	0.2283	1500	7.2138
7.2365	0.3044	2000	7.2126
7.2619	0.3805	2500	7.2130
7.2272	0.4567	3000	7.2117
7.2336	0.5328	3500	7.2137
7.2263	0.6089	4000	7.2139
7.2321	0.6850	4500	7.2129
7.2257	0.7611	5000	7.2124
7.2248	0.8372	5500	7.2121
7.2289	0.9133	6000	7.2121
7.2144	0.9894	6500	7.2131
7.2155	1.0656	7000	7.2133
7.215	1.1417	7500	7.2130
7.2146	1.2178	8000	7.2122
7.1995	1.2939	8500	7.2126
7.2025	1.3700	9000	7.2136
7.2302	1.4462	9500	7.2128
7.2078	1.5223	10000	7.2133
7.2063	1.5984	10500	7.2133
7.216	1.6745	11000	7.2128
7.1949	1.7506	11500	7.2132
7.2213	1.8267	12000	7.2131
7.2236	1.9028	12500	7.2132
7.2244	1.9789	13000	7.2132

Framework versions

PEFT 0.14.0
Transformers 4.49.0
Pytorch 2.6.0+cu124
Datasets 3.3.2
Tokenizers 0.21.0

Billyyy
/

mon_nllb_3B_r32

mon_nllb_3B_r32

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Billyyy/mon_nllb_3B_r32

Evaluation results