Edit model card

collapse_gemma-2-2b_hs2_accumulatesubsample_iter18_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2012
  • Num Input Tokens Seen: 4866032

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.4189 0.0527 5 1.2765 258624
0.9994 0.1053 10 1.2087 518872
0.9465 0.1580 15 1.2227 764216
0.6926 0.2107 20 1.2787 1024832
0.7235 0.2633 25 1.2728 1288048
0.6502 0.3160 30 1.2796 1549192
0.5507 0.3687 35 1.2801 1810408
0.4606 0.4213 40 1.2544 2071688
0.3668 0.4740 45 1.2498 2323016
0.358 0.5267 50 1.2442 2589208
0.3527 0.5793 55 1.2084 2844384
0.4372 0.6320 60 1.2294 3100696
0.3068 0.6847 65 1.2174 3351336
0.3254 0.7373 70 1.2254 3615008
0.3402 0.7900 75 1.2190 3868904
0.3489 0.8427 80 1.2200 4132088
0.2991 0.8953 85 1.2094 4391104
0.2674 0.9480 90 1.2146 4654296

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
7
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter18_sftsd1

Base model

google/gemma-2-2b
Finetuned
(429)
this model