Edit model card

collapse_gemma-2-2b_hs2_replace_iter16_sftsd1

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.4970
  • Num Input Tokens Seen: 4537240

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 1
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3909 0
1.6365 0.0513 5 1.2802 229400
0.8632 0.1026 10 1.2735 465264
0.5774 0.1538 15 1.4739 696504
0.2434 0.2051 20 1.7648 929152
0.1136 0.2564 25 1.9942 1159192
0.038 0.3077 30 2.1920 1391288
0.0573 0.3590 35 2.3179 1624688
0.0319 0.4103 40 2.4043 1855840
0.0727 0.4615 45 2.5002 2087048
0.0454 0.5128 50 2.4862 2320968
0.0301 0.5641 55 2.4569 2550960
0.0242 0.6154 60 2.4576 2792184
0.0236 0.6667 65 2.4494 3026136
0.025 0.7179 70 2.4515 3263360
0.0231 0.7692 75 2.4645 3501552
0.0242 0.8205 80 2.4850 3728216
0.0229 0.8718 85 2.4945 3963160
0.0226 0.9231 90 2.4950 4208032
0.023 0.9744 95 2.4998 4442400

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
7
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for RylanSchaeffer/collapse_gemma-2-2b_hs2_replace_iter16_sftsd1

Base model

google/gemma-2-2b
Finetuned
(429)
this model