Edit model card

collapse_gemma-2-2b_hs2_accumulatesubsample_iter17_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2128
  • Num Input Tokens Seen: 5003104

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.2643 0.0530 5 1.2796 263768
1.0578 0.1060 10 1.2160 538880
0.9748 0.1590 15 1.2198 798984
0.8303 0.2121 20 1.2433 1063864
0.7147 0.2651 25 1.2522 1327624
0.7074 0.3181 30 1.2652 1591744
0.6043 0.3711 35 1.2441 1864144
0.532 0.4241 40 1.2657 2123440
0.4841 0.4771 45 1.2331 2391152
0.4656 0.5302 50 1.2461 2657288
0.4954 0.5832 55 1.2342 2917880
0.4648 0.6362 60 1.2295 3178608
0.4401 0.6892 65 1.2343 3447880
0.4389 0.7422 70 1.2254 3714296
0.4304 0.7952 75 1.2193 3978736
0.4205 0.8482 80 1.2232 4250824
0.3847 0.9013 85 1.2118 4525704
0.4281 0.9543 90 1.2189 4792840

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter17_sftsd2

Base model

google/gemma-2-2b
Finetuned
(429)
this model