Edit model card

collapse_gemma-2-2b_hs2_accumulatesubsample_iter19_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2109
  • Num Input Tokens Seen: 4937688

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.2963 0.0534 5 1.2824 266400
1.0496 0.1067 10 1.2285 538144
0.9147 0.1601 15 1.2357 798048
0.773 0.2135 20 1.2779 1067320
0.635 0.2668 25 1.2615 1330240
0.6584 0.3202 30 1.2861 1596344
0.5049 0.3736 35 1.2864 1861336
0.5945 0.4270 40 1.2556 2123472
0.5127 0.4803 45 1.2573 2388944
0.5269 0.5337 50 1.2402 2657312
0.4278 0.5871 55 1.2537 2921360
0.4665 0.6404 60 1.2328 3187968
0.5046 0.6938 65 1.2301 3456832
0.5073 0.7472 70 1.2264 3722120
0.3945 0.8005 75 1.2260 3983472
0.338 0.8539 80 1.2163 4252048
0.3709 0.9073 85 1.2219 4516808
0.4595 0.9606 90 1.2109 4777360

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter19_sftsd0

Base model

google/gemma-2-2b
Finetuned
(429)
this model