Edit model card

collapse_gemma-2-2b_hs2_accumulatesubsample_iter18_sftsd0

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.2320
  • Num Input Tokens Seen: 4965616

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 0
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.2383 0.0535 5 1.2809 261784
1.0702 0.1070 10 1.2352 527160
0.9047 0.1604 15 1.2298 796832
0.7863 0.2139 20 1.2596 1057200
0.7233 0.2674 25 1.2640 1317152
0.6138 0.3209 30 1.2867 1585112
0.6639 0.3743 35 1.2563 1848712
0.4351 0.4278 40 1.2637 2116952
0.4406 0.4813 45 1.2563 2389216
0.4663 0.5348 50 1.2317 2659560
0.5592 0.5882 55 1.2441 2932656
0.4722 0.6417 60 1.2254 3199512
0.5026 0.6952 65 1.2319 3467576
0.4221 0.7487 70 1.2160 3732536
0.3425 0.8021 75 1.2294 4000632
0.453 0.8556 80 1.2140 4266888
0.4114 0.9091 85 1.2336 4531504
0.4125 0.9626 90 1.2095 4801232

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
6
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_accumulatesubsample_iter18_sftsd0

Base model

google/gemma-2-2b
Finetuned
(429)
this model